Joan Serrà

Welcome to my personal web page.

You can also find me on X, Google Scholar, LinkedIn, or GitHub (in order of usage).

Links to page content

Bio: short biography.
Publications: there you can go to preprint, recent (2021-current), past (2011-2020), or prehistoric (before 2011) publications/patents.
Talks: recent and past talks.
Misc: there you can go to experience/education, scientific service, projects, merits, teaching, or students/interns.
Contact: physical address, email, and map.

Short bio

I am a staff research scientist and team lead with Sony AI (since 2024). I do research on machine learning, with a focus on audio and multimedia analysis, synthesis, and retrieval. I was born in Riudarenes, Girona (1980). I did an MSc and PhD in machine learning for audio at the Music Technology Group of Universitat Pompeu Fabra (2006-2011) and a postdoc in artificial intelligence at IIIA-CSIC (2011-2015). After that, I joined Telefónica R&D as a machine learning researcher (2015-2019) and Dolby Laboratories as an AI researcher and manager (2019-2024). I have had research stays at the Max Planck Institute for the Physics of Complex Systems (2010) and the Max Planck Institute for Computer Science (2011). I have been involved in several research projects, co-invented over 20 patents, and co-authored over 150 publications, many of them highly cited and/or in top tier venues. I occasionally act as reviewer or area chair for some of those venues (provided articles are free access/charge), and give talks and lectures on subjects of my interest (lately basically related to representation learning and generative modeling).

Publications and Patents

Preprints

Automatic music mixing using a generative model of effect embeddings
E. Moliner, M.A. Martínez-Ramírez, J. Koo, W.-H. Liao, K.W. Cheuk, J. Serrà, V. Välimäki, & Y. Mitsufuji
ArXiv, 2511.08040. Nov 2025.
[arxiv] [demo] [code]

Towards blind data cleaning: a case study in music source separation
A. Gui, W. Choi, J. Koo, K. Shimada, T. Shibuya, J. Serrà, W.-H. Liao, & Y. Mitsufuji.
ArXiv, 2510.15409. Oct 2025.
[arxiv]

Automatic music sample identification with multi-track contrastive learning
A. Riou, J. Serrà, & Y. Mitsufuji.
ArXiv, 2510.11507. Oct 2025.
[arxiv] [code] [checkpoint]

Leveraging Whisper embeddings for audio-based lyrics matching
E. Mancini, J. Serrà, P. Torroni, & Y. Mitsufuji.
ArXiv, 2510.08176. Oct 2025.
[arxiv] [code] [checkpoints]

Attribution-by-design: ensuring inference-time provenance in generative music systems
F. Morreale, W. Hutiri, J. Serrà, A. Xiang, & Y. Mitsufuji.
ArXiv, 2510.08062. Oct 2025.
[arxiv]

[Back to top]

Recent (2021-current)

2025

System and method for attributing an output of a generative artificial intelligence (AI) system
F. Morreale, J. Serrà, Y. Mitsufuji, W. Hutiri, & A. Xiang.
Patent US-19/341,289 (Sep 26, 2025).

Enhancing neural audio fingerprint robustness to audio degradation for music identification
R.O. Araz, G. Cortès-Sebastià, E. Molina, J. Serrà, X. Serra, Y. Mitsufuji, & D. Bogdanov.
Int. Soc. for Music Information Retrieval Conf. (ISMIR), in press. Oct 2025.
[arxiv] [code] [checkpoints]

A comprehensive real-world assessment of audio watermarking algorithms: will they survive neural codecs?
Y. Özer*, W. Choi*, J. Serrà, M.K. Singh, W.-H. Liao, & Y. Mitsufuji.
Conf. of the Int. Speech Communication Assoc. (INTERSPEECH), pp. 5113-5117. Aug 2025.
[arxiv] [isca] [code]

Large-scale training data attribution for music generative models via unlearning
W. Choi*, J. Koo*, K.W. Cheuk*, J. Serrà, M.A. Martínez-Ramírez, Y. Ikemiya, N. Murata, Y. Takida, W.-H. Liao, & Y. Mitsufuji.
NeurIPS Creative AI Track, Advances in Neural Information Processing Systems (NeurIPS), in press. Dec 2025.
(Also presented at AI Heard That! ICML Workshop on Machine Learning for Audio in Jul 2025)
[arxiv] [neurips] [presentation]

Supervised contrastive learning from weakly-labeled audio segments for musical version matching
J. Serrà, R.O. Araz, D. Bogdanov, & Y. Mitsufuji.
Int. Conf. on Machine Learning (ICML), pp. 53923-53939. Jul 2025.
[arxiv] [pmlr] [code] [checkpoints]

Joint semantic knowledge distillation and masked acoustic modeling for full-band speech restoration with improved intelligibility
X. Liu, X. Li, J. Serrà, & S. Pascual.
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP). Apr 2025.
[arxiv] [doi] [demo]

Sequential contrastive audio-visual learning
I. Tsiamas, S. Pascual, C. Yeh, & J. Serrà.
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP). Apr 2025.
[arxiv] [doi]

2024

Discogs-VINet-MIREX
R.O. Araz, J. Serrà, X. Serra, Y. Mitsufuji, & D. Bogdanov.
Music Information Retrieval Evaluation eXchange (MIREX). Nov 2024.
[mirex]

Masked generative video-to-audio transformers with enhanced synchronicity
S. Pascual, C. Yeh, I. Tsiamas, & J. Serrà.
European Conf. on Computer Vision (ECCV), pp. 247-264. Sep 2024.
[arxiv] [doi] [demo]

Joint semantic knowledge distillation and masked acoustic modeling for full-band speech restoration with improved intelligibility
X. Liu, X. Li, J. Serrà, & S. Pascual.
Patent ES-P202430693 (Sep 2, 2024), WO-2024-116230 (Sep 2, 2024).

GASS: generalizing audio source separation with large-scale data
J. Pons, X. Liu, S. Pascual, & J. Serrà.
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 546-550. Apr 2024.
[arxiv] [demo]

2023

System for audio-visual content trimming: highlight generation, moment retrieval and summarization for user generated content
G. Jagatap, G. KV, D. Chandran, J. Serrà, & A. Fanelli
Patent ES-P202331088 (Dec 27, 2023), US-63/561206 (Mar 4, 2024).

Mono-to-stereo through parametric stereo generation
J. Serrà, D. Scaini, S. Pascual, D. Arteaga, J. Pons, J. Breebaart, & G. Cengarle
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 304-310. Nov 2023.
[arxiv] [doi]

CLIPSonic: text-to-audio synthesis with unlabeled videos and pretrained language-vision models
H.-W. Dong, X. Liu, J. Pons, G. Bhattacharya, S. Pascual, J. Serrà, T. Berg-Kirkpatrick, & J. McAuley
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Oct 2023.
[arxiv] [doi] [demo]

Upsampling layers for music source separation
J. Pons, J. Serrà, S. Pascual, G. Cengarle, D. Arteaga, & D. Scaini
European Signal Processing Conf. (EUSIPCO), pp. 311-315. Sep 2023.
[arxiv] [doi] [demo]

Full-band general audio synthesis with score-based diffusion
S. Pascual, G. Bhattacharya, C. Yeh, J. Pons, & J. Serrà
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP). Jun 2023.
[arxiv] [doi] [demo]

Quantitative evidence on overlooked aspects of enrollment speaker embeddings for target speaker separation
X. Liu, X. Li, & J. Serrà
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP). Jun 2023.
[arxiv] [doi]

Adversarial permutation invariant training for universal sound separation
E. Postolache, J. Pons, S. Pascual, & J. Serrà
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP). Jun 2023.
[arxiv] [doi] [demo]

Source separation and audio analytics for audio editing
J. Pons, J. Serrà, S. Pascual, & G. Cengarle
Patent ES-P202330336 (Apr 28, 2023), US-63/512218 (Jul 6, 2023), EP-23183758.4 (Jul 6, 2023).

Learning text-queried sound synthesis without using text-audio pairs
H.-W. Dong, X. Liu, J. Pons, G. Bhattacharya, S. Pascual, & J. Serrà
Patent ES-P202330333 (Apr 27, 2023).

Machine learning methods for generating parametric stereo parameters from mono signals
J. Serrà, D. Scaini, S. Pascual, D. Arteaga, J. Pons, J. Breebaart, & G. Cengarle
Patent ES-P202330275 (Apr 23, 2023), US-63/499282 (May 1, 2023).

2022

Adversarial permutation invariant training for universal sound separation
J. Pons, E. Postolache, S. Pascual, & J. Serrà
Patent ES-P202230890 (Oct 17, 2022), US-63/440568 (Jan 23, 2023), EP-23/075668 (Sep 18, 2023).

End-to-end general audio synthesis with generative networks
S. Pascual, J. Serrà, G. Bhattacharya, C. Yeh, & J. Pons
Patent ES-P202230889 (Oct 17, 2022), US-63/433650 (Dec 19, 2022), US-23/34098 (Sep 29, 2023).

Universal speech enhancement with score-based diffusion
J. Serrà, S. Pascual, J. Pons, R.O. Araz, & D. Scaini
Technical Report. Jun 2022.
[arxiv] [examples]

On loss functions and evaluation metrics for music source separation
E. Gusó, J. Pons, S. Pascual, & J. Serrà
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 306-310. May 2022.
[arxiv] [doi]

Self-supervised perceptual audio encoding by mixing discriminative and reconstructive tasks
S. Pascual, J. Serrà, & J. Pons
Patent ES-P202230230 (Mar 18, 2022), ES-P202230275 (Mar 25, 2022), US-63/407287 (Sep 16, 2022), US-63/424523 (Nov 11, 2022).

Assessing algorithmic biases for musical version identification
F. Yesiler, M. Miron, J. Serrà, & E. Gómez
ACM Int. Conf. on Web Search and Data Mining (WSDM), pp. 1284-1290. Feb 2022.
[arxiv] [doi] [data,code]

Lognormals, power laws and double power laws in the distribution of frequencies of harmonic codewords from classical music
M. Serra-Peralta, J. Serrà, & A. Corral
Scientific Reports 12, 2615. Feb 2022.
[arxiv] [doi] [code]

2021

Audio-based musical version identification: elements and challenges
F. Yesiler, G. Doras, R.M. Bittner, C. Tralie, & J. Serrà
IEEE Signal Processing Magazine 38(6): 115-136. Nov 2021.
[arxiv] [doi] [web]

Adversarial auto-encoding for packet loss concealment
S. Pascual, J. Serrà, & J. Pons
IEEE Workshop on Appl. of Signal Proc. to Audio and Acoustics (WASPAA), pp. 71-75. Oct 2021.
[arxiv] [doi]

Universal speech enhancement with generative neural networks
J. Serrà, S. Pascual, & J. Pons
Patent ES-P202130914 (Sep 29, 2021), US-63/287207 (Dec 8, 2021), ES-P202230427 (May 18, 2022), US-63/392575 (Jul 27, 2022), PCT/EP22/77144 (Sep 29, 2022).

Heaps’ law and vocabulary richness in the history of classical music harmony
M. Serra-Peralta, J. Serrà, & A. Corral
EPJ Data Science 10: 40. Aug 2021.
[arxiv] [doi] [code]

Upsampling layers for audio synthesis
J. Pons, J. Serrà, S. Pascual, G. Cengarle, D. Arteaga, & D. Scaini
Patent ES-P202130417 (May 7, 2021), US-63/220279 (Jul 9, 2021).

On tuning consistent annealed sampling for denoising score matching
J. Serrà, S. Pascual, & J. Pons
Technical report. Apr 2021.
[arxiv]

Investigating the efficacy of music version retrieval systems for setlist identification
F. Yesiler, E. Molina, J. Serrà, & E. Gómez
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 541-545. Jun 2021.
[arxiv] [doi] [data,code]

Upsampling artifacts in neural audio synthesis
J. Pons, S. Pascual, G. Cengarle, & J. Serrà
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3005-3009. Jun 2021.
[arxiv] [doi] [code]

Automatic multitrack mixing with a differentiable mixing console of neural audio effects
C.J. Steinmetz, J. Pons, S. Pascual, & J. Serrà
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 71-75. Jun 2021.
[arxiv] [doi] [samples,scripts]

SESQA: semi-supervised learning for speech quality assessment
J. Serrà, J. Pons, & S. Pascual
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 381-385. Jun 2021.
[arxiv] [doi]

[Back to top]

Past (2011-2020)

2020

A computer-implemented method for detecting anomalous behaviors of electronic devices and computer programs thereof
M. Carós Roca, A. Lutu, J. Serrà, & D. Perino
Patent EP-20383109.4A (Dec 17, 2020).
[gpatents]

Real-time packet loss concealment using deep generative networks
S. Pascual, J. Serrà, & J. Pons
Patent ES-P202031040 (Oct 15, 2020), US-63/126,123 (Dec 16, 2020), ES-P202130258 (Mar 24,2021), US-63/195831 (Jun 2, 2021), PCT/EP21/78443 (Oct 14, 2021).
[gpatents]

Less is more: faster and better music version identification with embedding distillation
F. Yesiler, J. Serrà, & E. Gómez
Int. Soc. for Music Information Retrieval Conf. (ISMIR). Oct 2020.
[arxiv] [ismir]

Combining musical features for cover detection
G. Doras, F. Yesiler, J. Serrà, E. Gómez, & G. Peeters
Int. Soc. for Music Information Retrieval Conf. (ISMIR). Oct 2020.
[zenodo] [ismir]

Experience: advanced network operations in (un-)connected remote communities
D. Perino, X. Yang, J. Serrà, A. Lutu, & I. Leontiadis
ACM Int. Conf. on Mobile Computing and Networking (MobiCom), num. 1. Sep 2020.
[acm] [doi]

Method for learning an audio quality metric combining labeled and unlabeled data
J. Serrà, J. Pons, & S. Pascual
Patent ES-P202030605 (Jun 22, 2020), EP-21732931.7 (Jun 21,2021), JP-2022-579132 (Jun 21, 2021), US-18/012256 (Jun 21, 2021).
[gpatents]

System for automated multitrack mixing in the waveform domain with a learned differentiable mixing console and controller network
C.J. Steinmetz & J. Serrà
Patent ES-P202030604 (Jun 22, 2020), EP-21731213.1 (Jun 16, 2021), JP-2022-578976 (Jun 16, 2021), US-18/012245 (Jun 16, 2021).
[gpatents]

Accurate and scalable version identification using musically-motivated embeddings
F. Yesiler, J. Serrà, & E. Gómez
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 21-25. May 2020.
[arxiv] [doi] [code,model,eval]

Input complexity and out-of-distribution detection with likelihood-based generative models
J. Serrà, D. Álvarez, V. Gómez, O. Slizovskaia, J.F. Núñez, & J. Luque
Int. Conf. on Learning Representations (ICLR). Apr 2020.
[arxiv] [openreview] [video]

2019

Method for detecting permanent failures in mobile telecommunication networks
D. Perino, J. Serrà, X. Yang
Patent EP-19383101.3A (Dec 12, 2019)
[gpatents]

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion
J. Serrà, S. Pascual, & C. Segura
Advances in Neural Information Processing Systems (NeurIPS) 32: 6790-6800. Dec 2019.
[arxiv [neurips] [code] [examples]

Towards generalized speech enhancement with generative adversarial networks
S. Pascual, J. Serrà, & A. Bonafonte
Conf. of the Int. Speech Communication Assoc. (INTERSPEECH), pp. 161-165. Sep 2019.
[arxiv] [doi] [code] [examples]

Learning problem-agnostic speech representations from multiple self-supervised tasks
S. Pascual, M. Ravanelli, J. Serrà, A. Bonafonte, & Y. Bengio
Conf. of the Int. Speech Communication Assoc. (INTERSPEECH), pp. 1791-1795. Sep 2019.
[arxiv] [doi] [code,model]

Time-domain speech enhancement using generative adversarial networks
S. Pascual, J. Serrà, & A. Bonafonte
Speech Communication 114: 10-21. Sep 2019.
[doi] [code] [examples1,examples2]

Exploring efficient neural architectures for linguistic-acoustic mapping in text-to-speech
S. Pascual, J. Serrà, & A. Bonafonte
Applied Sciences 9(16): 3391. Aug 2019.
[doi] [code]

Training neural audio classifiers with few data
J. Pons, J. Serrà, & X. Serra
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 16-20. May 2019.
[arxiv] [doi] [code]

2018

When the state of the art is ahead of the state of understanding: unintuitive properties of deep neural networks
J. Serrà
Métode Science Studies Journal 99: 13-17. Dec 2018.
[uv] [doi]

There goes Wally: anonymously sharing your location gives you away
A. Pyrgelis, N. Kourtellis, I. Leontiadis, J. Serrà, & C. Soriente
IEEE Int. Conf. on Big Data (BigData), pp. 1218-1227. Dec 2018.
[arxiv] [doi]

Real non-volume preserving voice conversion
S. Pascual, J. Serrà, & A. Bonafonte
LXAI Research Workshop (NeurIPS-LXAI). Dec 2018.
[talp] [lxai]

Self-attention linguistic-acoustic decoder
S. Pascual, A. Bonafonte, & J. Serrà
IberSPEECH Conf., pp. 152-156. Nov 2018.
[arxiv] [isca]

Whispered-to-voiced alaryngeal speech conversion with generative adversarial networks
S. Pascual, A. Bonafonte, J. Serrà, & J.A. Gonzalez
IberSPEECH Conf., pp. 117-121. Nov 2018.
[arxiv] [isca] [code]

Towards a universal neural network encoder for time series
J. Serrà, S. Pascual, & A. Karatzoglou
Int. Conf. of the Catalan Association for Artificial Intelligence (CCIA), Frontiers in Artificial Intelligence and Applications 308, pp. 120-129. Oct 2018.
[arxiv] [ios]

MobInsight: a framework using semantic neighborhood features for localized interpretations of urban mobility
S. Park, J. Serrà, E. Frias-Martinez, & N. Oliver
ACM Trans. on Interactive Intelligent Systems 8(3): 23. Jul 2018.
[arxiv] [doi] [demo]

Overcoming catastrophic forgetting with hard attention to the task
J. Serrà, D. Surís, M. Miron, & A. Karatzoglou
Int. Conf. on Machine Learning (ICML) 80: 4555-4564. Jul 2018.
[arxiv] [pmlr] [code]

Empirical evidence on daily cash flow time series and its implications for forecasting
F. Salas-Molina, J.A. Rodríguez-Aguilar, J. Serrà, M. Guillen, & F.J. Martín
Statistics and Operations Research Transactions 42(1): 73-98. Jun 2018.
[arxiv] [doi] [data]

Language and noise transfer in speech enhancement generative adversarial network
S. Pascual, M. Park, J. Serrà, A. Bonafonte, & K.-H. Ahn
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 5019-5023. Apr 2018.
[arxiv] [doi]

Unintuitive properties of deep neural networks
J. Serrà
Proc. of the EC Workshop on Human Behaviour and Machine Intelligence (HUMAINT), pp. 11-12. Mar 2018.
[ec]

2017

Continual prediction of notification attendance with classical and deep network approaches
K. Katevas, I. Leontiadis, M. Pielot, & J. Serrà
Technical report. Dec 2017.
[arXiv]

Beyond interruptibility: predicting opportune moments to engage mobile phone users
M. Pielot, B. Cardoso, K. Katevas, J. Serrà, A. Matic, & N. Oliver
ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 1(3): 91. Sep 2017. Presented at UbiComp 2017.
[pielot] [doi]

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks
J. Serrà & A. Karatzoglou
ACM Conf. on Recommender Systems (RECSYS), pp. 279-287. Aug 2017.
[arxiv] [doi]

SEGAN: speech enhancement generative adversarial network
S. Pascual, A. Bonafonte, & J. Serrà
Conf. of the Int. Speech Communication Assoc. (INTERSPEECH), pp. 3642-3646. Aug 2017.
[arxiv] [doi] [code] [examples]

Class-based prediction errors to detect hate speech with out-of-vocabulary words
J. Serrà, I. Leontiadis, D. Spathis, G. Stringhini, J. Blackburn, & A. Vakali
Workshop on Abusive Language Online (ALW), Conf. of the Association for Computational Linguistics (ACL), pp. 36-40. Aug 2017.
[openreview] [acl]

Practical processing of mobile sensor data for continual deep learning predictions
K. Katevas, I. Leontiadis, M. Pielot, & J. Serrà
Workshop on Deep Learning for Mobile Systems and Applications (DeepMobile), ACM Int. Conf. on Mobile Systems, Applications and Services (MOBISYS), pp. 19-24. Jun 2017.
[arxiv] [doi]

Compact embedding of binary-coded inputs and outputs using Bloom filters
J. Serrà & A. Karatzoglou
Int. Conf. on Learning Representations (ICLR) Workshop. Apr 2017.
[openreview]

The good, the bad, and the KPIs: how to combine performance metrics to better capture under-performing sectors in mobile networks
I. Leontiadis, J. Serrà, A. Finamore, G. Dimopoulos, & K. Papagiannaki
IEEE Int. Conf. on Data Engineering (ICDE), pp. 297-308. Apr 2017.
[ieee] [doi]

Hot or not? Forecasting cellular network hot spots using sector performance indicators
J. Serrà, I. Leontiadis, A. Karatzoglou, & K. Papagiannaki
IEEE Int. Conf. on Data Engineering (ICDE), pp. 259-270. Apr 2017.
[arxiv] [doi]

Empowering cash managers to achieve cost savings by improving predictive accuracy
F. Salas-Molina, F.J. Martín, J.A. Rodríguez-Aguilar, J. Serrà, & J.L. Arcos
International Journal of Forecasting 23(2): 403-415. Apr 2017.
[arxiv] [doi]

Performance metrics using KPI combinations to better capture underperforming sectors in mobile networks
I. Leontiadis, J. Serrà, & A. Finamore
Patent EP17382164.6, filed on 31/03/2017.
[gpatents]

Forecast of cellular network hot spots using sector performance indicators
J. Serrà & I. Leontiadis
Patent EP17382163.8, filed on 31/03/2017.
[gpatents]

Effect of acoustic conditions on algorithms to detect Parkinson’s disease from speech
J.C. Vásquez-Correa, J. Serrà, J.R. Orozco-Arroyave, J.F. Vargas-Bonilla, & E. Nöth
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 5065-5069. Mar 2017.
[ieee] [doi]

2016

A genetic algorithm to discover flexible motifs with support
J. Serrà, A. Matic, J.L. Arcos, & A. Karatzoglou
Workshop on Spatial and Spatiotemporal Data Mining (SSTDM), IEEE Int. Conf. on Data Mining (ICDM), pp. 1153-1158. Dec 2016.
[arxiv] [doi] [code]

Time-delayed melody surfaces for raga recognition
S. Gulati, J. Serrà, K.K. Ganguli, S. Senturk, & X. Serra
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 751-757. Aug 2016.
[mtg] [ismir]

Ranking and significance of variable-length similarity-based time series motifs
J. Serrà, I. Serra, A. Corral, & J.L. Arcos
Expert Systems with Applications 55: 452-460. Aug 2016.
[arxiv] [doi] [code]

What makes a city vital and safe: Bogotá case study
A. Bogomolov, A. Clavijo, M. De Nadai, R. Lara Molina, B. Lepri, E. Letouzé, N. Oliver, G. Pestre, J. Serrà, N. Shoup, & A. Ramirez Suarez
Annual Bank Conf. on Development Economics (ABCDE): Data and Development Economics, session 2D: Crime, Civil Wars, and Hotspots. Jun 2016.
[abcde1] [abcde2]

Phrase-based raga recognition using vector space modeling
S. Gulati, J. Serrà, V. Ishwar, S. Senturk, & X. Serra
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 66-70. Mar 2016.
[mtg] [doi] [code,data]

Discovering raga motifs by characterizing communities in networks of melodic patterns
S. Gulati, J. Serrà, V. Ishwar, & X. Serra
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 286-290. Mar 2016.
[mtg] [doi] [code,data]

Particle swarm optimization for time series motif discovery
J. Serrà & J.L. Arcos
Knowledge-Based Systems 92: 127-137. Jan 2016.
[arxiv] [doi] [code]

2015

Improving melodic similarity in Indian art music using culture specific melodic characteristics
S. Gulati, J. Serrà, & X. Serra
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 680-686. Oct 2015.
[mtg] [ismir]

Analysis of the impact of a tag recommendation system in a real-world folksonomy
F. Font, J. Serrà, & X. Serra
ACM Trans. on Intelligent Systems and Technology 7(1): 6. Oct 2015.
[iiia] [doi]

Zipf-like distributions in language and music
I. Moreno, F. Font-Clos, J. Serrà, & A. Corral
Complexitat.cat Workshop. May 2015.
[complexitat.cat]

An evaluation of methodologies for melodic similarity in audio recordings of Indian art music
S. Gulati, J. Serrà, & X. Serra
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 678-682. Apr 2015.
[iiia] [doi]

2014

Mining melodic patterns in large audio collections of Indian art music
S. Gulati, J. Serrà, V. Ishwar, & X. Serra
Int. Conf. on Signal Image Technology and Internet Based Systems (SITIS), pp. 264-271. Nov 2014.
[iiia] [doi] [code,data]

Melodic pattern extraction in large collections of music recordings using time series mining techniques
S. Gulati, J. Serrà, V. Ishwar, & X. Serra
Demo at the Int. Soc. for Music Information Retrieval Conf. (ISMIR). Oct 2014.
[iiia] [ismir]

An empirical evaluation of similarity measures for time series classification
J. Serrà & J.L. Arcos
Knowledge-Based Systems 67: 305-314. Sep 2014.
[iiia] [doi]

Landmark detection in Hindustani music melodies S. Gulati, J. Serrà, K.K. Ganguli, & X. Serra
Int. Computer Music Conf. / Sound and Music Computing Conf. (ICMC/SMC), vol. 2, pp. 1062-1068. Sep 2014.
[iiia] [icmc,smc] [data]

Class-based tag recommendation and user-based evaluation in online audio clip sharing
F. Font, J. Serrà, & X. Serra
Knowledge-Based Systems 67: 131-142. Sep 2014.
[iiia] [doi]

Unsupervised music structure annotation by time series structure features and segment similarity
J. Serrà, M. Müller, P. Grosche, & J.L. Arcos
IEEE Trans. on Multimedia, Special Issue on Music Data Mining 16(5): 1229-1240. Aug 2014.
[iiia] [doi] [code]

Intonation analysis of ragas in Carnatic music
G.K. Koduri, V. Ishwar, J. Serrà, & X. Serra
Journal of New Music Research, Special Issue on Computational Approaches to the Art Music Traditions of India and Turkey 43(1): 72-93. Mar 2014.
[iiia] [doi]

Audio clip classification using social tags and the effect of tag expansion
F. Font, J. Serrà, & X. Serra
AES Int. Conf. on Semantic Audio, paper num. 26. Jan 2014.
[iiia] [aes]

2013

Folksonomy-based tag recommendation for collaborative tagging systems
F. Font, J. Serrà, & X. Serra
Int. Journal on Semantic Web and Information Systems 9(2): 1-30. Nov 2013.
[iiia] [doi]

What can we learn from massive music archives?
J. Serrà
Dagstuhl Seminar 13451: Computational Audio Analysis. M. Müller, S. Narayanan, and B. Schuller, eds. Wadern, Germany. Nov 2013.
[iiia] [dagstuhl]

Learning of units and knowledge representation
F. Metze, X. Anguera, S. Ewert, J. Gemmeke, D. Kolossa, E. Mower Provost, B. Schuller, & J. Serrà
Dagstuhl Seminar 13451: Computational Audio Analysis. M. Müller, S. Narayanan, and B. Schuller, eds. Wadern, Germany. Nov 2013.
[iiia] [dagstuhl]

Source separation
C. Uhle, J. Driedger, B. Edler, S. Ewert, F. Graf, G. Kubin, M. Müller, N. Ono, B. Pardo, & J. Serrà
Dagstuhl Seminar 13451: Computational Audio Analysis. M. Müller, S. Narayanan, and B. Schuller, eds. Wadern, Germany. Nov 2013.
[iiia] [dagstuhl]

Towards cover group thumbnailing
P. Grosche, M. Müller, & J. Serrà
ACM Int. Conf. on Multimedia (ACM-MM), pp. 613-616. Oct 2013.
[iiia] [doi]

Sample identification in hip-hop music
J. Van Balen, J. Serrà, & M. Haro
From Sounds to Music and Emotions, M. Aramaki, M. Barthet, R. Kronland-Martinet, and S. Ystad eds., Lecture Notes in Computer Science, vol. 7900, ch. 5, pp. 301-312. Sep 2013.
[iiia] [doi]

Note onset deviations as musical piece signatures J. Serrà, T.H. Özaslan, & J.L. Arcos
PLoS ONE 8(7): e69268. Jul 2013.
[plos] [doi]

Cognitive prognosis of acquired brain injury patients using machine learning techniques
J. Serrà, J.L. Arcos, A. García-Rudolph, A. García-Molina, T. Roig, & J.M. Tormos
Int. Conf. on Advanced Cognitive Technologies and Applications (COGNITIVE), pp. 108-113.
May 2013. [iiia] [csic]

Measuring quantitative trends in western popular music
J. Serrà, A. Corral, M. Boguñá, M. Haro, & J.L. Arcos
CRM-Imperial College Workshop on Complex Systems. Barcelona, Spain. Apr 2013.
[iiia] [crm]

Tonal representations for music retrieval: from version identification to query-by-humming
J. Salamon, J. Serrà, & E. Gómez
Int. Journal of Multimedia Information Retrieval 2(1): 45-58. Feb 2013.
[iiia] [doi]

2012

Structure-based audio fingerprinting for music retrieval
P. Grosche, J. Serrà, M. Müller, & J.L. Arcos
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 55-60. Oct 2012.
[iiia] [ismir]

Folksonomy-based tag recommendation for online audio clip sharing
F. Font, J. Serrà, & X. Serra
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 73-78. Oct 2012.
[iiia] [ismir]

Characterizaztion of intonation in Carnatic music by parametrizing pitch histograms
G.K. Koduri, J. Serrà, & X. Serra
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 199-204. Oct 2012.
[iiia] [ismir]

Extracting semantic information from an on-line Carnatic music forum
M. Sordo, J. Serrà, G.K. Koduri, & X. Serra
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 355-360. Oct 2012.
[iiia] [ismir]

The importance of detecting boundaries in music structure annotation
J. Serrà, M. Müller, P. Grosche, & J.L. Arcos
Music Information Retrieval Evaluation eXchange (MIREX). Oct 2012.
[iiia] [mirex]

A competitive measure to assess the similarity between two time series
J. Serrà & J.L. Arcos
Int. Conf. on Case-Based Reasoning (ICCBR), Lecture Notes in Artificial Intelligence 7466, pp. 414-427. Sep 2012.
[iiia] [doi] [code]

The computer as music critic J. Serrà & J.L. Arcos
The New York Times, pp. SR12. September 15, 2012.
[iiia] [nytimes]

Measuring the evolution of contemporary western popular music
J. Serrà, A. Corral, M. Boguñá, M. Haro & J.L. Arcos
Scientific Reports 2: 521. Jul 2012.
[iiia] [doi]

Characterization and exploitation of community structure in cover song networks
J. Serrà, M. Zanin, P. Herrera, & X. Serra
Pattern Recognition Letters 33(9): 1032-1041. Jul 2012.
[arxiv] [doi]

Unsupervised detection of music boundaries by time series structure features
J. Serrà, M. Müller, P. Grosche, & J.L. Arcos
AAAI Int. Conf. on Artificial Intelligence (AAAI), pp. 1613-1619. Jul 2012.
[iiia] [aaai]

Extracting semantic information from on-line art music discussion forums
M. Sordo, J. Serrà, G.K. Koduri, & X. Serra
CompMusic Workshop. Jul 2012.
[iiia] [compmusic]

Computational analysis of intonation in Indian art music
G.K. Koduri, J. Serrà, & X. Serra
CompMusic Workshop. Jul 2012.
[iiia] [compmusic]

Automatic identification of samples in hip hop music
J. Van Balen, M. Haro, & J. Serrà
Int. Symp. on Computer Music Modeling and Retrieval (CMMR), pp. 544-551. Jun 2012.
[iiia] [cmmr]

Quantifying the evolution of popular music
J. Serrà, A. Corral, M. Boguñá, M. Haro, & J.L. Arcos
No Lineal Conf. Jun 2012.
[iiia] [nolineal]

Patterns, regularities, and evolution of contemporary popular music
J. Serrà, A. Corral, M. Boguñá, M. Haro, & J.L. Arcos
Complexitat.Cat Workshop. May 2012.
[iiia] [complexitat.cat]

Power-law distribution in encoded MFCC frames of speech, music, and environmental sound signals
M. Haro, J. Serrà, A. Corral, & P. Herrera
Workshop on Advances in Music Information Research (AdMIRe), Int. World Wide Web Conf. (WWW), pp. 895-902. Apr 2012.
[iiia] [www]

Melody, bassline, and harmony representations for music version identification
J. Salamon, J. Serrà, & E. Gómez
Workshop on Advances in Music Information Research (AdMIRe), Int. World Wide Web Conf. (WWW), pp. 887-894. Apr 2012.
[iiia] [www]

Audio content-based music retrieval
P. Grosche, M. Müller, & J. Serrà
Multimodal Music Processing, M. Müller, M. Goto, and M. Schedl eds., Dagstuhl Follow-Ups, Dagstuhl Publishing, Wadern, Germany, vol. 3, ch. 9, pp. 157-174. Apr 2012.
[iiia] [dagstuhl]

Zipf’s law in short-time timbral codings of speech, music, and environmental sound signals
M. Haro, J. Serrà, P. Herrera, & A. Corral.
PLoS ONE 7(3): e33993. Mar 2012.
[iiia] [doi]

Predictability of music descriptor time series and its application to cover song detection
J. Serrà, H. Kantz, X. Serra, & R.G. Andrzejak
IEEE Trans. on Audio, Speech and Language Processing 20(2): 514-525. Feb 2012.
[mtg] [doi]

2011

Identification of versions of the same musical composition: audio content-based approaches and post-processing steps
J. Serrà
LAP Lambert Academic Publishing, Saarbrücken, Germany. ISBN 978-3-8473-2785-1. Dec 2011.
[amazon] [bn]

Assessing the tuning of sung Indian classical music
J. Serrà, G.K. Koduri, M. Miron, & X. Serra
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 263-268. Oct 2011.
[mtg] [ismir]

Computational approaches for the understanding of melody and rhythm in Carnatic music
G.K. Koduri, M. Miron, J. Serrà, & X. Serra
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 157-162. Oct 2011.
[mtg] [ismir]

Unifying low-level and high-level music similarity measures
D. Bogdanov, J. Serrà, N. Wack, P. Herrera, & X. Serra
IEEE Trans. on Multimedia 13(4): 687-701. Aug 2011.
[mtg] [doi]

Method for calculating measures of similarity between time signals
J. Serrà
Patent US 2011/0178615, published July 21, 2011. Priority num. ES20090001057-20090423. Also published as ES 2354330 (Método para calcular medidas de similitud entre señales temporales).
[fpo] [espacenet]

Nonlinear audio recurrence analysis with application to genre classification
J. Serrà, C.A. De Los Santos, & R.G. Andrzejak
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 169-172. May 2011.
[mtg] [doi]

Identification of versions of the same musical composition by processing audio descriptions
J. Serrà
PhD Thesis. Universitat Pompeu Fabra, Barcelona, Spain. Mar 2011.
[mtg] [tdx]

Cover song networks: analysis and accuracy increase
J. Serrà, M. Zanin, & P. Herrera
Int. Journal of Complex Systems in Science 1: 55-59. Jan 2011.
[mtg]

[Back to top]

Prehistoric (before 2011)

2010

Model-based cover song detection via threshold autoregressive forecasts
J. Serrà, H. Kantz, & R.G. Andrzejak
Workshop on Music and Machine Learning (MML), ACM Int. Conf. on Multimedia (ACM-MM), pp. 13-16. Oct 2010.
[mtg] [doi]

Unsupervised accuracy improvement for cover song detection using spectral connectivity network
M. Lagrange & J. Serrà
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 595-600. Aug 2010.
[mtg] [ismir]

Hybrid music similarity measure
D. Bogdanov, J. Serrà, N. Wack, & P. Herrera
Music Information Retrieval Evaluation eXchange (MIREX). Aug 2010.
[mtg] [mirex]

Music classification using high-level models
N. Wack, C. Laurier, O. Meyers, R. Marxer, D. Bogdanov, J. Serrà, E. Gómez, & P. Herrera
Music Information Retrieval Evaluation eXchange (MIREX). Aug 2010.
[mtg] [mirex]

Cover song networks: analysis and accuracy increase
J. Serrà, M. Zanin, & P. Herrera
Net-Works Int. Conf. Jun 2010.
[mtg] [net-works]

Indexing music by mood: design and integration of an automatic content-based annotator
C. Laurier, O. Meyers, J. Serrà, M. Blech, P. Herrera, & X. Serra
Multimedia Tools and Applications 48(1): 161-184. May 2010.
[mtg] [doi]

Audio cover song identification and similarity: background, approaches, evaluation, and beyond
J. Serrà, E. Gómez, & P. Herrera
Advances in Music Information Retrieval, Z. W. Ras and A. A. Wieczorkowska eds., Studies in Computational Intelligence series, Springer, Berlin, Germany, vol. 274, ch. 14, pp. 307-332. Mar 2010.
[mtg] [doi]

2009

From low-level to high-level: comparative study of music similarity measures
D. Bogdanov, J. Serrà, N. Wack, & P. Herrera
Workshop on Advances in Music Information Research (AdMIRe), IEEE Int. Symp. on Multimedia, pp. 453-458. Dec 2009.
[mtg] [doi]

Unsupervised detection of cover song sets: accuracy improvement and original identification
J. Serrà, M. Zanin, C. Laurier, & M. Sordo
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 225-230. Oct 2009.
[mtg] [ismir]

Music mood representations from social tags
C. Laurier, M. Sordo, J. Serrà, & P. Herrera
Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 381-386. Oct 2009.
[mtg] [ismir]

The discipline formerly known as MIR
P. Herrera, J. Serrà, C. Laurier, E. Guaus, E. Gómez, & X. Serra
Int. Society for Music Information Retrieval Conf. (ISMIR), special session on the Future of MIR (fMIR). Oct 2009.
[mtg] [fmir]

Cover song retrieval by cross recurrence quantification and unsupervised set detection
J. Serrà, M. Zanin, & R.G. Andrzejak
Music Information Retrieval Evaluation eXchange (MIREX). Oct 2009.
[mtg] [mirex]

Music type groupers (MTG): generic music classification algorithms
N. Wack, E. Guaus, C. Laurier, O. Meyers, R. Marxer, D. Bogdanov, J. Serrà, & P. Herrera
Music Information Retrieval Evaluation eXchange (MIREX). Oct 2009.
[mtg] [mirex]

Hybrid similarity measures for music recommendation
D. Bogdanov, J. Serrà, N. Wack, & P. Herrera
Music Information Retrieval Evaluation eXchange (MIREX). Oct 2009.
[mtg] [mirex]

Assessing the results of a cover song identification system with coverSSSSearch
J. Serrà
Demo Session at the Int. Soc. for Music Information Retrieval Conf. (ISMIR). Oct 2009.
[mtg]

Cross recurrence quantification for cover song identification
J. Serrà, X. Serra, & R.G. Andrzejak
New Journal of Physics 11: 093017. Sep 2009.
[mtg] [doi] [code]

Shape-based spectral contrast descriptor
V. Akkermans, J. Serrà, & P. Herrera
Sound and Music Computing Conf. (SMC), pp. 143-148. Jul 2009.
[mtg] [smc]

Music mood annotator design and integration
C. Laurier, O. Meyers, J. Serrà, M. Blech, & P. Herrera
Int. Workshop on Content-Based Multimedia Indexing (CBMI), pp. 156-161. Jun 2009.
[mtg] [doi]

2008

Music similarity systems and methods using descriptors
E. Gómez, P. Herrera, P. Cano, J. Janer, J. Serrà, J. Bonada, S. El-Hajj, T. Aussenac, & G. Holmberg
Patent US 2008/300702, published December 31, 2008. Priority nums. US20070946860P-20070628, US20070970109P-20070905, and US20070988714P-20071116. Also published as WO 2009/001202.
[fpo] [espacenet]

Statistical analysis of chroma features in western music predicts human judgments of tonality
J. Serrà, E. Gómez, P. Herrera, & X. Serra
Journal of New Music Research 37(4): 299-309. Dec 2008.
[mtg] [doi]

Transposing chroma representations to a common key
J. Serrà, E. Gómez, & P. Herrera
Int. Conf. on The Use of Symbols to Represent Music and Multimedia Objects, pp. 45-48. Oct 2008.
[mtg] [unimi]

Improving binary similarity and local alignment for cover song detection
J. Serrà, E. Gómez, & P. Herrera
Music Information Retrieval Evaluation eXchange (MIREX). Sep 2008.
[mtg] [mirex]

Chroma binary similarity and local alignment applied to cover song identification
J. Serrà, E. Gómez, P. Herrera, & X. Serra
IEEE Trans. on Audio, Speech and Language Processing 16(6): 1138-1152. Aug 2008.
[mtg] [doi]

Audio cover song identification based on tonal sequence alignment
J. Serrà & E. Gómez
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 61-64. Apr 2008.
[mtg] [doi]

2007

A qualitative assessment of measures for the evaluation of a cover song identification system
J. Serrà
Int. Conf. on Music Information Retrieval (ISMIR), pp. 319-322. Sep 2007.
[mtg] [ismir]

A cover song identification system based on sequences of tonal descriptors
J. Serrà & E. Gómez
Music Information Retrieval Evaluation eXchange (MIREX). Sep 2007.
[mtg] [mirex]

Music similarity based on sequences of descriptors: tonal features applied to cover song identification
J. Serrà
MSc Thesis. Universitat Pompeu Fabra, Barcelona, Spain. Sep 2007.
[mtg]

[Back to top]

Talks

Musical version matching with segment reductions and contrastive learning: Invited talk at the MILA/Concordia Conversational AI Reading Group (23/10/25), keynote talk at the 2025 CVCRD Workshop (25/7/2025), invited talk at the LLM Mini-Workshop of Universitat de Lleida (26/3/2025).

How I got here (and some potentially useful advice that came up in the way): Keynote talk at the opening ceremony of the 2025 EMAI Master at Universitat Pompeu Fabra (1/9/2025).

AI-related panels and debates: Panelist at the UPF workshop on Generative Music in the Context of Trustworthy AI (30/5/2025), panelist at the 2025 meeting of the Catalan RDI-IA network (22/5/2025), debate participation at the 2024 sAIling Catalonia event (19/9/2024).

Generative modeling at Dolby Laboratories: Monthly talk at LISTEN - Télécom Paris (30/11/2023), invited talk at Cicle de conferències Patronat/EPS at Universitat de Girona (30/11/2021).

Universal speech enhancement with score-based diffusion: Invited talk at Apple MLR Barcelona (13/7/2023), invited talk at Adobe Research - Audio San Francisco (27/7/2022).

Imagination in power! Deep generative models with a focus on speech and music: Invited talk at Casa de Cultura de Girona - Últimes Fronteres (12/5/2022).

Universal, generative speech enhancement: Keynote at the Deep Learning Barcelona Symp. 2021 (23/12/2021) [youtube, starts around 3:02:00].

Understanding and visualizing speech quality: Keynote at the Web Audio Conf. 2021 (13/7/2021) [youtube]. With J. DeLancey.

Version identification in the 20s: Tutorial at the Int. Soc. for Music Information Retrieval Conf. (11/10/2020) [ismir] [slides]. With F. Yesiler & C. Tralie.

From correlation to imagination - GANs and deep generative models: Keynote at the Conf. of the Catalan Association for Artificial Intelligence - CCIA 2019 (24/10/2019), invited talk at Universitat de Lleida - Màster Gestió Administrativa (27/5/2019), invited talk at Universitat de Vic - Jornades de Mecatrònica (9/5/2019), invited seminar at Enginyeria La Salle - Universitat Ramon Llull (26/3/2019), selected talk at Telefónica Talks - Madrid (27/9/2018).

Deep learning at Telefónica Research: Invited lecture at Barcelona Technology School (26/6/2018), invited talk at Dolby Laboratories - Barcelona (14/3/2018).

Overcoming catastrophic forgetting with hard attention to the task: Invited talk at bcn.ai (6/9/2018), invited seminar at CVC-UAB - Lifelong Learning Seminar (16/2/2018).

Unintuitive properties of deep neural networks: Invited talk at EU Science Hub - Human Behavior and Machine Intelligence Workshop (5/3/2018), invited seminar at UPC/TelecomBCN - IDL Winter School (30/1/2018).

Facts and myths about deep learning: Invited talk at Madrid Machine Learning (11/5/2017), invited seminar at IIIA-CSIC Seminars (15/3/2017), invited talk at B-Debate on Artificial Intelligence (7-8/3/2017) [vimeo], invited lecture at UPC-TelecomBCN DLSL Winter School (27/1/2017) [youtube], invited seminar at UPF MTG-Compmusic Seminar (18/11/2016) [youtube], talk at DataBeersBCN (4/10/2016) [youtube].

Note onset deviations as musical piece signatures: Invited lecture at UPF - Música i Publicitat (12/11/2014).

Patterns, regularities, and evolution of contemporary western popular music: Invited lecture at UPF - Música i Publicitat (6/11/2012, 13/11/2013), invited seminar at CRM - CAMP seminar (3/5/2012).

Machine learning for music discovery: Tutorial at the Int. Symp. on Frontiers of Research on Speech and Music (20/01/2012) [compmusic] [slides].

Audio content-based music retrieval: Tutorial at the Int. Soc. for Music Information Retrieval Conf. (24/10/2011) [mtg] [ismir] [slides]. With M. Müller.

Model-based cover song detection via threshold autoregressive forecasts: Invited talk at UPC - TALP (6/10/2010).

Music descriptor time series: Invited talk at Max Planck Institute for the Physics of Complex Systems (5/2/2010).

Tecnologia per al descobriment de cançons i per la predicció d’èxits musicals: Invited talk at ESMUC - Seminaris de Sonologia (9/10/2006).

[Back to top]

Miscellaneous

Experience/education

Sony AI (2024-Present). Staff research scientist and team lead, Music Foundation Model Team.

Dolby Laboratories (2019-2024). Staff research scientist, senior staff research scientist, and research manager of the Applied AI team, Advanced Technology Group.

Telefónica R&D (2015-2019). Research scientist and Senior research scientist. Machine Learning, Data Mining and User Modeling Group.

Spanish National Research Council (2011-2015). Postdoctoral researcher. Artificial Intelligence Research Institute (IIIA-CSIC). Dept. of Learning Systems.

Max Planck Institute for Computer Science (Nov 2011-Jan 2012). Invited postdoctoral researcher. Research group on Multimedia Information Retrieval and Music Processing.

Universitat Pompeu Fabra (2006-2011). MSc candidate, PhD candidate, postdoctoral researcher, and teaching assistant. Dept. of Information and Communication Technologies, Music Technology Group.

Max Planck Institute for the Physics of Complex Systems (Feb-Jun 2010). Guest scientist. Research group on Nonlinear Dynamics and Time Series Analysis.

Polyphonic HMI (2005-2006). Research engineer. R&D Dept.

Enginyeria La Salle, Universitat Ramon Llull (1998-2004). Undergraduate studies. Electronics Engineering and Telecommunications Engineering (two degrees).

[Back to top]

Scientific service

Journal referee: Connection Science (2007), EURASIP Journal on Advances on Signal Processing (2010), IEEE Journal of Selected Topics in Signal Processing (2011), Journal of New Music Research (2012-2014), Journal of Intelligent Information Systems (2012), Artificial Intelligence (2013-2014), IEEE Trans. on Audio, Speech and Language Processing (2013-2014), IEEE Trans. on Multimedia (2014-2015), Knowledge and Information Systems (2014), PLoS ONE (2014), Information Sciences (2014), EURASIP Journal on Audio, Speech, and Music Processing (2015), ACM Trans. on Multimedia Computing, Communications, and Applications (2015-2016), Mathematical Problems in Engineering (2016), Knowledge-Based Systems (2017).
— Since 2017, I am not reviewing for journals that have publication or article access paywalls.

Conference reviewer or area chair: ISMIR (2008-2014, 2024-2025), ICMC (2009-2010), SMC (2010-2013), ICASSP (2011-2013, 2016), ACM-MM (2013-2014), AES (2013), TRI (2015), UbiComp (2015), AAAI (2016-2018), ICWSM (2017), NIPS-ML4Audio (2017), CCIA (2018), KDD (2019), IJCAI (2019), NeurIPS (2020-2021, 2025), ICLR (2021-2022), ICML (2021-2022, 2024), INTERSPEECH (2021-2022).
— Since 2018, I am not reviewing for conferences that have publication or article access paywalls.

Conference organization: MIRUM (2011-2012), SMC 2010, IberSpeech 2018, DLBCN (2018-2019, 2021-2025).

Research funding agencies: CONICYT (2017).

Others: Reviewer for MIT Technology Review (2018), mentor at AAAI Doctoral Consortium (2012), advisor at bcn.ai (2018-2020), mentor at DLBCN (2022, 2024).

Member of the ELLIS Society (2022).

[Back to top]

Projects

[I’m currently not involved in funded R&D projects, but in the past I was…]

Publicly-funded:

Accordion (2019-2021): Adaptive edge/cloud compute and network continuum over a heterogeneous sparse edge infrastructure to support nextgen applications. European Commission: RIA-2019-871793 (as PI, submitted the proposal).
i-BiDaaS (2018-2020): Industrial-driven big data as a self-service solution. European Commission: RIA-2017-780787.
BISON (2015-2017): Big speech data analytics for contact centers. European Commission: ICT-2014-15-645323.
COGNITIO (2013-2015): Multiparametric analysis of image, clinical data and therapy for the optimization of cognitive rehabilitation on TBI. Spanish Government, Ministry of Economy and Competitiveness: TIN-2012-38450-C03-03.
PRAISE (2012-2015): Practice and performance analysis inspiring social education. European Commission: ICT-2011-8-318770.
WorthPlay (2012-2014): Worth playing: digital games for active and positive ageing. CSIC General Foundation and Obra Social La Caixa: “Proyecto cero” on ageing 2011.
CompMusic (2011-2016): Computational models for the discovery of the world’s music. European Research Council: ERC grant agreement 267583.
BuscaMedia (2010-2011): Automatic generation of audiovisual narrative. Spanish Government, Ministry of Science and Innovation: CENIT-2009-1026.
DRIMS (2009-2012): Description and retrieval of music and sound information. Spanish Government, Ministry of Science and Innovation: TIN-2009-14247-C02-01.
Music 3.0 (2009-2010): Integrated system for music creation, interaction and socialization. Spanish Government, Ministry of Industry, Tourism and Trade: Avanza contenidos, TSI-070100-2008-318.
PHAROS (2007-2009): Platform for search of audiovisual resources across on-line spaces. European Commission: IST-2006-045035.
SALERO (2006-2009): Semantic audiovisual entertainment reusable objects. European Commission: IST-2007-0309BSCW.
CANTATA (2006-2008): Content aware networked systems towards advanced and tailored assistance. European Commission: ITEA-PROFIT, FIT-350205-2007-10.
EmCAP (2005-2008): Emergent cognition through active perception. European Commission: IST-2006-013123.

Privately-funded:

Of course, several projects at Dolby Laboratories.
Of course, several projects at Telefónica R&D.
AISLE (2012): Artificial intelligence software for logistics in enterprises. Conzentra Tecnologías de la Información S.L., IIIA-010108120004.

[Back to top]

Merits

Merits, awards and competitive grants:

Best student paper nomination to Santi Pascual for our paper on “Learning problem-agnostic speech representations using multiple self-supervised tasks”. Conf. of the Int. Speech Comm. Assoc. (INTERSPEECH), 2019.
Juan de la Cierva Incorporación postdoctoral fellowship (IJCI-2014-19901; Ranked 1/58). Spanish Government, Spanish Ministry for Economy and Competitiveness, 2015. Declined.
Top 1% performer (Ranked 11/1528) in the Kaggle AXA Driver Telematics Analytics challenge, 2015.
Best paper award for our paper “Cognitive prognosis of acquired brain injury patients using machine learning techniques”. Int. Conf. on Advanced Cognitive Technologies and Applications (COGNITIVE), 2013.
Best-in-class award in the 2012 Music Information Retrieval Evaluation eXchange contest (MIREX12 Structure Segmentation task).
Knowledge transfer award in the Information and Communication Technologies area (PhD thesis modality). Board of Trustees of Universitat Pompeu Fabra, 2011.
European doctorate mention. Universitat Pompeu Fabra, 2011.
JAE-DOC postdoctoral grant (JAEDOC069/2010). Consejo Superior de Investigaciones Científicas (CSIC, Spanish National Research Council), 2011-2014.
Best-in-class award in the 2010 Music Information Retrieval Evaluation eXchange contest (MIREX10 Audio Classical Composer Identification task).
One to six month grant for PhD students and junior researchers (A/09/96235). Deutscher Akademischer Austausch Dienst (DAAD, German Academic Exchange Service), 2010.
Predoctoral scholarship for short research stays abroad (BE-DGR-2009). Catalan Government, Agency for Administration of University and Research Grants (AGAUR), 2010. Declined.
Best-in-class award in the 2009 Music Information Retrieval Evaluation eXchange contest (MIREX09 Audio Cover Song Identification task).
Best-in-class award in the 2009 Music Information Retrieval Evaluation eXchange contest (MIREX09 Audio Classical Composer Identification task).
Best-in-class award in the 2008 Music Information Retrieval Evaluation eXchange contest (MIREX08 Audio Cover Song Identification task).
Best-in-class award in the 2007 Music Information Retrieval Evaluation eXchange contest (MIREX07 Audio Cover Song Identification task).
R+D+I scholarship. Universitat Pompeu Fabra, 2006-2010.

Appearances in media:

[Sorry for the broken links, I promise these all worked at some point]

Quins han estat els articles més citats del CCIA? “Towards a universal neural network encoder for time series” — ACIA Nodes.
An artificial intelligence that does not forget what it learns (2018) — BlogThinkBig, TechWorld News.
Just four tweets can reveal the identity of an anonymous troll (2018) — New Scientist, Comm. of the ACM.
¡Sistemas de recomendación en forma! (2018) — BlogThinkBig.
Note onset deviations as musical piece signatures (2013) — Diari Ara.
The computer as music critic (2012) — NYTimes, ACIA Nodes.
Measuring the evolution of contemporary western popular music (2012) — Nature Asia, TVE (Informativos, min 42:30), Antena3 (Telediario, part 3, min 3:40), La Vanguardia, El País, Scientific American, The Economist (Babbage), Reuters, CNet, The Guardian, Daily Mail, The Telegraph, … Also highlighted in Nature.com.
Identification of versions of the same musical composition by processing audio descriptions (2011) — UPF e-Notícies.
MTG Technology (2010) — TVE (Tres 14, min 19).
Unsupervised detection of cover song sets: accuracy improvement and original identification (2010) — UPF e-Notícies.
Cross recurrence quantification for cover song identification (2009) — COM Radio (Extraradi), Diario ABC, Science Daily, Innovations-report, PhysOrg, SINC, Asian News Int., …

[Back to top]

Teaching

[Currently I’m not teaching]

Invited lecturer, Universitat de Vic, Vic, Barcelona (2020). Postgraduate course on Artificial Intelligence with Deep Learning (2019-2020), Universitat Politècnica de Catalunya, Barcelona. Master in Sound and Music Computing (2019), Universitat Pompeu Fabra, Barcelona — Deep learning seminars, theory, applications, & coding.

Part-time adjunct professor, Universitat de Vic, Vic, Barcelona. Undergraduate studies of Biomedical Engineering. Diagnosis Decision Support Systems: Deep learning seminars & coding (2018-2019).

Tenure-track lecturer accreditation (“Professor Lector”) from the Catalan Government (Agency for Management of University and Research Grants, AGAUR), Jan 2014.

Teaching assistant, Dept. of Inf. and Com. Tech., Universitat Pompeu Fabra, Barcelona. Undergraduate studies. Probabilitat i processos estocàstics (2010-2011), càlcul mètodes numèrics (2009-2010), fonaments físics de la informàtica (2008-2010), computadors III (2006-2009).

Invited instructor, Eng. La Salle, Universitat Ramon Llull, Barcelona. MSc studies. Máster de Producción Sonora y Audio Digital. Music information retrieval (2007-2008).

[Back to top]

Students/interns

R.O. Araz. Building factual super-similarity for music segments. PhD thesis, Universitat Pompeu Fabra. 2022-Ongoing. Co-directed with D. Bogdanov and X. Serra.

D. Goswami. Training data attribution in diffusion models. Student internship, Sony AI. October 2025.

A. Riou. Music sample identification. Student internship, Sony AI. September 2025.

A. Gui. Machine unlearning for detecting problematic music source separation data. Student internship, Sony AI. September 2025. Co-supervised with K. Shimada, T. Shibuya, & W.-H. Liao.

E. Moliner. Generative prediction of music mixing parameters. Student internship, Sony AI. August 2025. Co-supervised with M.A. Martinez & W.-H. Liao.

E. Mancini. Audio-based lyrics matching. Student collaboration, Sony AI. July 2025.

Y. Özer. A systematic framework for comprehensive analysis of the robustness of audio watermarking approaches. Student internship, Sony AI. December 2024. Co-supervised with W. Choi, M. Singh, & W.-H. Liao.

L.A. Lazendörfer. Investigating training data attribution in text-to-music generative models. Student internship, Sony AI. August 2024. Co-supervised with M. Singh.

A. Gómez-Villa. Synthesis from and separation in CLAP latents. Student internship, Dolby Laboratories. August 2023. Co-supervised with S. Pascual and J. Pons.

R.O. Araz. Improving quality and speed of universal speech enhancement with diffusion models. Student internship, Dolby Laboratories. June 2022.

F. Yesiler. Data-driven musical version identification: accuracy, scalability, and bias perspectives. PhD thesis, Universitat Pompeu Fabra. 2018-2022. Co-directed with E. Gómez.

G. Cambara. The effect of regressive and discriminative workers for learning self-supervised speech representations. Student internship, Dolby Laboratories. Sep 2021. Co-supervised with S. Pascual.

J. Bustos. Towards audio-conditioned generation of music album artwork. MSc thesis, Universitat Pompeu Fabra. Sep 2020. Co-directed with P. Herrera.

C. Steinmetz. Learning to mix with neural audio effects in the waveform domain. MSc thesis, Universitat Pompeu Fabra, & Student internship, Dolby Laboratories. Sep 2020. Co-directed with F. Font.

S. Pascual. End-to-end speech synthesis using deep neural networks. PhD thesis, Universitat Politècnica de Catalunya. 2017-2020. Co-directed with A. Bonafonte.

M. Serra-Peralta. Tunable, flow-based recommendations. Student internship, Telefónica Research. Dec 2019. Co-supervised with C. Segura.

D. Álvarez. Out-of-distribution likelihoods in deep generative models. Student internship, Telefónica Research. Nov 2019.

M. Carós. Deep anomaly detection in machine-to-machine network logs. Student internship, Telefónica Research. Jul 2019. Co-supervised with A. Lutu & D. Perino.

A. Gilbert. An investigation of in-sample forgetting in deep convolutional neural networks. MSc thesis, Universitat Pompeu Fabra. Jul 2019. Co-directed with M. Farrús.

J.F. Núñez. Normalizing flows for novelty detection. MSc thesis, Universitat Pompeu Fabra. Jul 2019. Co-directed with V. Gómez.

O. Slizovskaya. A prospective study of time series anomaly detection with normalizing flows. Student internship, Telefónica Research. Dec 2018. Co-supervised with I. Leontiadis.

M. del Tredici. Graph convolutional networks for recommender systems. Student internship, Telefónica Research. Oct 2018. Co-supervised with A. Karatzoglou, J. Luque, & C. Segura.

S. Raponi. User anonymity and reidentification in mobile weblog traces. Student internship, Telefónica Research. Aug 2018. Co-supervised with N. Kourtellis, I. Leontiadis, & D. Perino.

J. Pons. Neural network architectures for few-instance audio classification. Student internship, Telefónica Research. Jul 2018.

D. Surís & M. Miron. Overcoming catastrophic forgetting in neural networks. Student internships, Telefónica Research. Dec 2017.

A. Pyrgelis. Assessing mobility trajectory uniqueness in telco networks. Student internship, Telefónica Research. Nov 2017. Co-supervised with I. Leontiadis, N. Kourtellis, & C. Soriente.

S. Pascual. Deep spatiotemporal mobility prediction from cellular network events. Student internship, Telefónica Research. Dec 2016.

M.A. Orakzai. Gait-based authentication in real life. MSc thesis, Universitat Politècnica de Catalunya. Student internship, Telefónica Research. Jul 2016. Co-supervised with A. Matic, L. Navarro, & C. Soriente.

G. Pelino & Z. Holler. Predicting the success of telemarketing campaign calls. MSc thesis, Barcelona Graduate School of Economics. Student internships, Telefónica Research. Jul 2016. Co-supervised with A. Karatzoglou & A. Matic.

J.C. Vásquez-Correa. Analyzing the robustness of speech-based Parkinson detection algorithms under noise and audio transformations. Student internship, Telefónica Research. Feb 2016.

A. Bogomolov. Using call detail records for understanding crime data. Student internship, Telefónica Research. Jan 2016. Co-supervised with N. Oliver.

F. Capó. Western classical composer identification using symbolic data. MSc thesis, Universitat Pompeu Fabra. Sep 2015. Co-directed with P. Herrera.

F. Font. Tag recommendation using folksonomy information for online sound sharing platforms. PhD thesis, Universitat Pompeu Fabra. Jun 2015. Co-directed with X. Serra.

G. Herrero. Towards supervised music structure annotation: a case-based fusion approach. MSc thesis, Universitat Pompeu Fabra. Sep 2014.

M. Carbonell. Power laws: from linguistics to music. Bachelor’s final project, Universitat Autònoma de Barcelona. Feb 2014. Co-directed with A. Corral.

G. Meseguer. Automatic content-based detection of influences in the history of progressive rock music. MSc thesis, Universitat Pompeu Fabra. Sep 2013. Co-directed with P. Herrera.

J. Van Balen. Automatic recognition of samples in musical audio. MSc thesis, Universitat Pompeu Fabra. Sep 2011. Co-directed with M. Haro.

C. A. De Los Santos. Nonlinear audio recurrence analysis with application to music genre classification. MSc thesis, Universitat Pompeu Fabra. Sep 2010. Co-directed with R. G. Andrzejak.

S. Bromberg. Recurrence quantification analysis in music information retrieval tasks: an example on genre classification. MSc thesis, Universitat Pompeu Fabra. Jun 2010. Co-directed with R. G. Andrzejak.

A. Almarza. Implementació i avaluació d’algorismes per la descripció i classificació rítmica de la música. Bachelor’s final project, Universitat de Girona. Apr 2010.

C. Quirante. Quality assessment and enhancement of an industrial-strength audio fingerprinting system. MSc thesis, Universitat Pompeu Fabra. Sep 2009. Co-directed with P. Cano.

[Back to top]

Contact

Physical address:

Joan Serrà
Sony AI - Barcelona
OneCoWork Portal de l’Àngel
Av. del Portal de l’Àngel 40, planta 7
08002 Barcelona
firstname (dot) serra (at) sony (dot) com

Map:

[Back to top]

[Last edit: October 2025]