2024-03-28T20:55:06Zhttp://oai-repositori.upf.edu/oai/requestoai:repositori.upf.edu:10230/253982019-09-26T08:16:46Zcom_10230_20994com_10230_3col_10230_20995
MOOCs en España. Panorama actual de los Cursos Masivos Abiertos en Línea en las universidades españolas
Oliver Riera, Miquel
Hernández Leo, Davinia
Daza, Vanesa
Martín i Badell, Carles
Albó, Laia
Universitat Pompeu Fabra
Podemos afirmar que España se ha situado en muy poco tiempo, y de forma sorprendente, en el/ngrupo líder de países que más actividad están generando entorno a los cursos masivos en línea/nabiertos o MOOCs (del inglés Massive Open Online Courses). Que España haya sido el país/neuropeo líder en oferta en MOOCs durante el año 2013, con más de un centenar de cursos/nofertados y por encima de Reino Unido, Alemania o Francia es un hecho más que destacable. Si/nobservamos la demanda, es decir el volumen de participación en la oferta mundial de MOOCs,/nvolvemos a encontrar España dentro de los cinco países con más estudiantes que siguen este/ntipo de formación, teniendo por delante a países como EEUU, Reino Unido, Canadá o Brasil./nDesde la Cátedra de Telefónica de la Universitat Pompeu Fabra nos llama especialmente la/natención que España se encuentre de repente dentro del selecto “G8”, junto a potencias/nmundiales indiscutibles en educación superior. Creemos que merece la pena analizar las causas/nde dicho fenómeno en España y para ello hemos elaborado este primer informe. Como en toda/nfase preliminar de cualquier investigación es importante partir de una buenas hipótesis y de unos/ndatos empíricos sólidos sobre los que trabajar. En esta línea, “MOOCs en España” es el primer/ninforme generado por la Cátedra de Telefónica de la UPF sobre Social Innovation in Education./nDurante el año 2013 se ha realizado un trabajo exhaustivo de recogida de información sobre los/ncursos abiertos masivos en Internet en España como primer paso necesario para poder entender/nlas claves y el impacto que pueden llegar a tener a nivel social.
2014-01
info:eu-repo/semantics/report
Oliver M, Hernández-Leo D, Daza V, Martín C, Albó L. MOOCs en España. Panorama actual de los Cursos Abiertos en Línea en las universidades españolas. Barcelona: Universitat Pompeu Fabra; 2014. 33 p. (Cuaderno Red de Cátedras Telefónica. Social Innovation in Education)
http://hdl.handle.net/10230/25398
spa
Cuaderno Red de Cátedras/Telefónica. Social Innovation in Education;
http://creativecommons.org/licenses/by-nc-sa/3.0/es/
info:eu-repo/semantics/openAccess
Attribution-NonCommercial-ShareAlike 3.0 Spain
oai:repositori.upf.edu:10230/254002019-09-26T08:16:37Zcom_10230_20994com_10230_3col_10230_20995
MOOCs en España. Análisis de la demanda. Panorama actual de los Cursos Masivos Abiertos en Línea en la plataforma Miríada X
Oliver Riera, Miquel
Hernández Leo, Davinia
Albó, Laia
Universitat Pompeu Fabra
Este nuevo informe realizado por la Cátedra de Telefónica de la Universitat Pompeu Fabra/nrepresenta un segundo análisis del fenómeno MOOC (Massive Open Online Course) desde una/nperspectiva más centrada en la demanda de este tipo de formación. Hace un año apuntábamos a/nEspaña como el país líder en Europa en generación de cursos masivos en línea, podemos afirmar/nque sigue en la cresta de la ola aunque las distancias con otros países productores de MOOCs se/nhan acortado./nEl análisis realizado en este informe nos permite hablar de consolidación y diversificación del/nfenómeno MOOC, dando lugar a subproductos formativos como los SPOCs, o los NanoMOOCs/nentre otros. Este fenómeno de hibridación se debe a la gran demanda existente por parte de/ndiferentes colectivos, algunos de ellos orientados a la especialización de crear una oferta de/ncursos para su consumo local con estudiantes principalmente universitarios (SPOCs, Small/nPrivate Online Courses); o bien la creación de módulos o fracciones de cursos de muy corta/nduración con fines formativos muy prácticos (NanoMOOCs). La consolidación del fenómeno/nMOOC se observa a través de la aparición de segundas y terceras promociones de una misma/noferta, convirtiéndose en cursos de éxito y que aglutinan ya a varios miles de participantes. /nPero en este análisis nos hemos querido centrar en analizar la demanda, es decir en estudiar los/nperfiles de los participantes en MOOCs. Nos ha interesado particularmente estudiar los patrones/nde consumo y su perfil social. Para ello hemos contado con la colaboración del equipo técnico/nde la plataforma líder en Latinoamérica Miríada X (www.miriadax.net) con Telefónica/nEducación Digital que nos ha brindado información de cerca de 200.000 participantes de casi/n150 MOOCs lanzados en 2014. El análisis de la demanda es una de las claves que nos permitirá/nentender mejor el fenómeno MOOC y las posibles derivadas que se desprendan durante los/npróximos años./nAsí, el análisis realizado nos permite descubrir y cuantificar peculiaridades de este tipo de/nformación como por ejemplo que las mujeres tienen un rendimiento, o tasa de finalización de/ncursos, superior al del público masculino. También descubrimos que no hay grandes diferencias/nen cuanto a preferencias temáticas de cursos en función del género, siendo los MOOCs de/nCiencias Tecnológicas los más demandados incluso por el público femenino. Sabíamos que el/ntalón de Aquiles de los MOOCs es la tasa de finalización, pero descubrimos que no está/nhomogéneamente distribuida, y que los cursos de Historia, Geografía o Artes y Letras, cursados/npor un público más sénior, pueden llegar a doblar la media de finalización. Descubrimos que/nColombia es el país que más demanda genera en Miríada X después de España, claro./nFinamente, vemos que el público de Miríada X en Latinoamérica sigue un patrón formado/nmayoritariamente de estudiantes universitarios y con una media de edad más baja: lo cual nos hace pensar en que es la plataforma ideal para promocionar formación de postgrado o de/nespecialización por parte de Universidades en España en esta parte del mundo./nÉstas son sólo algunas de las conclusiones a las que llegamos en este informe. Esperemos que/nayude a aportar reflexión, datos y conocimiento sobre el fenómeno MOOC desde una nueva/nperspectiva global.
2015-11
info:eu-repo/semantics/report
Oliver M, Hernández-Leo D, Albó L. MOOCs en España. Análisis de la demanda. Panorama actual de los Cursos Masivos Abiertos en Línea en la plataforma Miríada X. Barcelona: Universitat Pompeu Fabra; 2015. 36 p. (Cuaderno Red de Cátedras Telefónica. Social Innovation in Education)
http://hdl.handle.net/10230/25400
spa
Cuaderno Red de Cátedras/Telefónica. Social Innovation in Education;
http://creativecommons.org/licenses/by-nc-sa/3.0/es/
info:eu-repo/semantics/openAccess
Attribution-NonCommercial-ShareAlike 3.0 Spain
oai:repositori.upf.edu:10230/340132020-06-11T09:44:00Zcom_10230_20994com_10230_3col_10230_20995
ISMIR 2004 audio description contest
Cano Vila, Pedro
Gómez Gutiérrez, Emilia, 1975-
Gouyon, Fabien
Herrera Boyer, Perfecto, 1964-
Koppenberger, Markus
Ong, Bee Suan
Serra, Xavier
Streich, Sebastian
Wack, Nicolas
In this paper we report on the ISMIR 2004 Audio Description
Contest. We first detail the contest organization, evaluation metrics, data and infrastructure. We then provide the details and results of each contest in turn. Published papers and algorithm source codes are given when originally available. We finally discuss some aspects of these contests and propose ways to organize future, improved, audio description contests.
2006
info:eu-repo/semantics/workingPaper
Cano P, Gómez E, Gouyon F, Herrera P, Koppenberger M, Ong B, Serra X, Streich S, Wack N. ISMIR 2004 audio description contest. Barcelona: Universitat Pompeu Fabra, Music technology Group; 2006. 20 p. Report No.: MTG-TR-2006-02
http://hdl.handle.net/10230/34013
eng
https://creativecommons.org/licenses/by-nc-nd/2.5/
info:eu-repo/semantics/openAccess
This work is licenced under the Creative Commons Attribution-NonCommercial-NoDerivs 2.5
oai:repositori.upf.edu:10230/343062018-04-07T01:31:13Zcom_10230_5542com_10230_20994com_10230_3col_10230_8581col_10230_20995
FORGe at WebNLG 2017
Mille, Simon
Dasiopoulou, Stamatia
This paper describes the FORGe generator at E2E. The input triples are mapped onto
sentences by applying a series of rule-based graph-transducers and aggregation grammars
to template predicate-argument structures associated to each property. We submitted two
primary systems to the task, one based on the grammars, and one based on templates, and
one secondary system, which is a variation of the grammar-based one.
2017
info:eu-repo/semantics/workingPaper
Mille S, Dasiopoulou S. FORGe at WebNLG 2017. Barcelona: Universitat Pompeu Fabra. Department of Information and Communications Technologies, 2017. Report No.: 17-09.
http://hdl.handle.net/10230/34306
eng
info:eu-repo/grantAgreement/EC/H2020/645012
info:eu-repo/semantics/openAccess
Universitat Pompeu Fabra
oai:repositori.upf.edu:10230/343072018-04-07T01:31:15Zcom_10230_5542com_10230_20994com_10230_3col_10230_8581col_10230_20995
FORGe at E2E 2017
Mille, Simon
Dasiopoulou, Stamatia
This paper describes the FORGe generator at WebNLG. The input DBpedia triples are
mapped onto sentences by applying a series of rule-based graph-transducers and aggregation
grammars to template predicate-argument structures associated to each property.
2017
info:eu-repo/semantics/workingPaper
Mille S, Dasiopoulou S. FORGe at E2E 2017. Barcelona: Universitat Pompeu Fabra. Department of Information and Communications Technologies, 2017. Report No.: 17-12.
http://hdl.handle.net/10230/34307
eng
info:eu-repo/grantAgreement/EC/H2020/645012
info:eu-repo/semantics/openAccess
oai:repositori.upf.edu:10230/345172018-05-02T09:29:23Zcom_10230_20994com_10230_3col_10230_20995
Data modelling for the evaluation of virtualized network functions resource allocation algorithms
Rankothge, Windhya
Le, Franck
Russo, Alessandra
Lobo, Jorge
To conduct a more realistic evaluations on resource allocation
algorithms for Virtualized Network Functions (VNFs),
researches need data on: (1) potential Network Functions (NFs)
chains (policies), (2) traffic flows passing through these NFs
chains, (3) how the dynamic traffic changes affect the NFs
(scale out/in) and (4) different data center architectures for the
cloud infrastructure. However, there are no publicly available
real data sets on NF chains and traffic that pass through NF
chains. Therefore, we have used data from previous empirical
analyses [1], [2] and made some assumptions to derive the
required data to evaluate resource allocation algorithms for
VNFs. We developed four programs to model the gathered
data and generate the required data. All gathered data and data
modelling programs are publicly available at [3]. We have used
these data for our work in [4] and [5].
2017
info:eu-repo/semantics/workingPaper
Rankothge W, Le F, Russo A, Lobo J. Data modelling for the evaluation of virtualized network functions resource allocation algorithms. 2017. 4 p.
http://hdl.handle.net/10230/34517
eng
http://arxiv.org/abs/1702.00369
Publicació relacionada: Rankothge W, Ma J, Le F, Russo A, Lobo J. Experimental results on the use of genetic algorithms for scaling virtualized network functions. In: 2015 IEEE Conference on Virtualization and Software Defined Network (NFV-SDN); 2015 Nov 18-21; San Francisco, CA. IEEE; 2015. p. 47-53. DOI 10.1109/NFV-SDN.2015.7387405 http://hdl.handle.net/10230/26036
http://hdl.handle.net/10230/26036
http://creativecommons.org/licenses/by-nc/3.0/es/
info:eu-repo/semantics/openAccess
Obra amb llicència Creative Commons Reconeixement-NoComercial 3.0 Espanya (CC BY-NC 3.0 ES).
oai:repositori.upf.edu:10230/355262018-09-28T01:31:46Zcom_10230_20994com_10230_3col_10230_20995
An Environment for the analysis, transformation and resynthesis of music sounds
Serra, Xavier
This paper describes an environment developed at CCRMA for the analysis, transformation, and resynthesis of sounds. It has been written on a Lisp Machine workstation, using an Array Processor to speed up the signal processing operations. The program is designed as a research tool and a sound manipulation workbench for music composition.
1988
info:eu-repo/semantics/workingPaper
Serra X. An Environment for the analysis, transformation and resynthesis of music sounds. Stanford: University of Stanford, Center for Computer Research in Music and Acoustics; 1988. 10 p. Report No.: STAN-M-52
http://hdl.handle.net/10230/35526
eng
https://creativecommons.org/licenses/by-nc-nd/3.0/es/
info:eu-repo/semantics/openAccess
© Xavier Serra
oai:repositori.upf.edu:10230/419122023-03-23T15:27:06Zcom_10230_5542com_10230_20994com_10230_3col_10230_8581col_10230_20995
A quantitative comparison of different approaches for melody extraction from polyphonic audio recordings
Gómez Gutiérrez, Emilia, 1975-
Streich, Sebastian
Ong, Bee Suan
Paiva, Rui Pedro
Tappert, Sven
Batke, Jan-Mark
Poliner, Graham
Ellis, Daniel P. W.
Bello, Juan Pablo
This paper provides an overview of current state-of-the-art approaches for melody extraction from polyphonic audio recordings, and it proposes a methodology for the quantitative evaluation of melody extraction algorithms. We first define a general architecture for melody extraction systems and discuss the difficulties of the problem in hand; then, we review different approaches for melody extraction which represent the current state-of-the-art in this area. We propose and discuss a methodology for evaluating the different approaches, and we finally present some results and conclusions of the comparison.
2006
info:eu-repo/semantics/article
Gómez E, Streich S, Ong B, Paiva RP, Tappert S, Batke JM, Poliner G, Ellis D, Bello JP. A quantitative comparison of different approaches for melody extraction from polyphonic audio recordings. 2006. 31 p.
http://hdl.handle.net/10230/41912
eng
info:eu-repo/grantAgreement/EC/FP6/507142
https://creativecommons.org/licenses/by-nc-nd/2.5/
info:eu-repo/semantics/openAccess
This work is licenced under the Creative Commons Attribution-NonCommercial-NoDerivs 2.5
Universitat Pompeu Fabra
oai:repositori.upf.edu:10230/450752020-07-07T01:30:42Zcom_10230_20994com_10230_3col_10230_20995
Transparent Facemask, MASKIN: prototipo de mascarilla transparente reutilizable y comprometida con medioambiente y sociedad.
Marco, Álvaro M.
Bernat, Nadia P.
De Vivo, Francesco
Ortega, Juan
Isern, Alejandra
Oviedo, Óscar
Sánchez, Paula
En el seno de los Premis Enginy COVID19 UPF, el grupo de estudiantes MASKIN y con no otro ánimo que la libre difusión de los resultados alcanzados, presenta su portfolio final que sirva de inspiración a terceros, estudiantes y colectivo académico y empresarial en el más amplio de sus conjuntos.
2020
info:eu-repo/semantics/slide
Marco AM, Bernat NP, De Vivo F, Ortega J, Isern A, Oviedo Ó, Sánchez P. Transparent Facemask, MASKIN: prototipo de mascarilla transparente reutilizable y comprometida con medioambiente y sociedad. Presentación en: Premis "Enginy contra la COVID-19". 13 diapositivas.
http://hdl.handle.net/10230/45075
spa
https://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
Obra bajo licencia Creative Commons, “Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional”
oai:repositori.upf.edu:10230/450762020-07-07T01:30:45Zcom_10230_20994com_10230_3col_10230_20995
Transparent Facemask,MASKIN: prototipo de mascarilla transparente reutilizable y comprometida con medioambiente y sociedad.
Usuari de càrrega
Álvaro M. Marco
Nadia P. Bernat
Francesco de Vivo
Juan Ortega
Alejandra Isern
Óscar Oviedo
Paula Sánchez
2020
info:eu-repo/semantics/other
Marco AM, Bernat NP, De Vivo F, Ortega J, Isern A, Oviedo Ó, Sánchez P. Transparent Facemask,MASKIN: prototipo de mascarilla transparente reutilizable y comprometida con medioambiente y sociedad. Poster presentat a: Premis "Enginy contra la COVID-19". 13 p.
http://hdl.handle.net/10230/45076
spa
info:eu-repo/semantics/openAccess
Obra sota llicència Creative Commons, “Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional”
oai:repositori.upf.edu:10230/541392022-10-26T01:32:21Zcom_10230_20994com_10230_3col_10230_20995
BAF: an audio fingerprinting dataset for broadcast monitoring
Cortès, Guillem
Ciurana, Alex
Molina, Emilio
Miron, Marius
Meyers, Owen
Six, Joren
Serra, Xavier
Audio Fingerprinting (AFP) is a well-studied problem
in music information retrieval for various use-cases e.g.
content-based copy detection, DJ-set monitoring, and music
excerpt identification. However, AFP for continuous
broadcast monitoring (e.g. for TV & Radio), where music
is often in the background, has not received much attention
despite its importance to the music industry. In this paper
(1) we present BAF, the first public dataset for music monitoring
in broadcast. It contains 74 hours of production
music from Epidemic Sound and 57 hours of TV audio
recordings. Furthermore, BAF provides cross-annotations
with exact matching timestamps between Epidemic tracks
and TV recordings. Approximately, 80% of the total annotated
time is background music. (2) We benchmark BAF
with public state-of-the-art AFP systems, together with our
proposed baseline PeakFP: a simple, non-scalable AFP algorithm
based on spectral peak matching. In this benchmark,
none of the algorithms obtain a F1-score above 47%,
pointing out that further research is needed to reach the
AFP performance levels in other studied use cases. The
dataset, baseline, and benchmark framework are open and
available for research.
2022-09-21
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/54139
eng
info:eu-repo/grantAgreement/ES/2PE/RTC2019-007248-7
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© G. Cortès, A. Ciurana, E. Molina, M. Miron, O. Meyers,
J. Six and X. Serra. Licensed under a Creative Commons Attribution
4.0 International License (CC BY 4.0).
oai:repositori.upf.edu:10230/541542022-09-23T11:45:38Zcom_10230_20994com_10230_3col_10230_20995
Violin etudes: a comprehensive dataset for f0 estimation and performance analysis
Tamer, Nazif C
Ramoneda, Pedro
Serra, Xavier
Violin performance analysis requires accurate and robust f0 estimates to give feedback on the playing accuracy. Despite the recent advancements in data-driven f0 estimators, their application to performance analysis remains a challenge due to style-specific and dataset-induced biases. In this paper, we address this problem by introducing Violin Etudes, a 27.8-hours violin performance dataset constructed with domain knowledge in instrument pedagogy and a novel automatic f0-labeling paradigm. Experimental results on unseen datasets show that the CREPE f0 estimator trained on Violin Etudes outperforms the widely-used pre-trained version trained on multiple manually-labeled datasets. Further preliminary findings suggest that (i) existing data-driven f0 estimators may overfit to equal temperament, and (ii) iterative re-labeling regularized by our novel Constrained Harmonic Resynthesis method can simultaneously enhance datasets and f0 estimators. Our dataset curation methodology is easily scalable to other instruments owing to the quantity of pedagogical data online. It also supports a range of MIR research directions thanks to the performance difficulty labels from educational institutions.
2022-09-22
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/54154
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© N. C. Tamer, P. Ramoneda, and X. Serra. Licensed under
a Creative Commons Attribution 4.0 International License (CC BY 4.0).
Attribution: N. C. Tamer, P. Ramoneda, and X. Serra, “Violin Etudes:
A Comprehensive Dataset for f0 Estimation and Performance Analysis
”, in Proc. of the 23rd Int. Society for Music Information Retrieval Conf.,
Bengaluru, India, 2022.
oai:repositori.upf.edu:10230/541552022-09-23T11:45:48Zcom_10230_20994com_10230_3col_10230_20995
In search of Sañcāras: tradition-informed repeated melodic pattern recognition in carnatic music
Nuttall, Thomas
Plaja-Roglans, Genís
Pearson, Lara
Sierra, Xavier
Carnatic Music is a South Indian art and devotional musical practice in which melodic patterns (motifs and phrases), known as sañcāras, play a crucial structural and expressive role. We demonstrate how the combination of transposition invariant features learnt by a Complex Autoencoder (CAE) and predominant pitch tracks extracted using a Frequency-Temporal Attention Network (FTANet) can be used to annotate and group regions of variable-length, repeated, melodic patterns in audio recordings of multiple Carnatic Music performances. These models are trained on novel, expert-curated datasets of hundreds of Carnatic audio recordings and the extraction process tailored to account for the unique characteristics of sañcāras in Carnatic Music. Experimental results show that the proposed method is able to identify 54% of all sañcāras annotated by a professional Carnatic vocalist. Code to reproduce and interact with these results is available online.
2022-09-22
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/54155
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GBI00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© T. Nuttall, G. Plaja-Roglans, L. Pearson, X. Serra. Licensed
under a Creative Commons Attribution 4.0 International License
(CC BY 4.0). Attribution: T. Nuttall, G. Plaja-Roglans, L. Pearson,
X. Serra, “In Search of Sañc¯aras: Tradition-informed Repeated Melodic
Pattern Recognition in Carnatic Music”, in Proc. of the 23rd Int. Society
for Music Information Retrieval Conf., Bengaluru, India, 2022.
oai:repositori.upf.edu:10230/541562022-11-01T02:31:49Zcom_10230_20994com_10230_3col_10230_20995
A diffusion-inspired training strategy for singing voice extraction in the waveform domain
Plaja-Roglans, Genís
Miron, Marius
Serra, Xavier
Notable progress in music source separation has been achieved using multi-branch networks that operate on both temporal and spectral domains. However, such networks tend to be complex and heavy-weighted. In this work, we tackle the task of singing voice extraction from polyphonic music signals in an end-to-end manner using an approach inspired by the training procedure of denoising diffusion models. We perform unconditional signal modelling to gradually convert an input mixture signal to the corresponding singing voice or accompaniment. We use fewer parameters than the state-of-the-art models while operating on the waveform domain, bypassing phase-related problems. More concisely, we train a non-causal WaveNet using a diffusion-inspired strategy improving the said network for singing voice extraction and obtaining performance comparable to the end-to-end state-of-the-art on MUSDB18. We further report results on a non-MUSDB-overlapping version of MedleyDB and the multi-track audio of the Saraga Carnatic dataset showing good generalization, and run perceptual tests of our approach. Code, models, and audio examples are made available.
2022-09-22
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/54156
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
info:eu-repo/grantAgreement/ES/2PE/RTC2019-007248-7
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© G. Plaja-Roglans, M. Miron, and X. Serra. Licensed under
a Creative Commons Attribution 4.0 International License (CC BY
4.0). Attribution: G. Plaja-Roglans, M. Miron, and X. Serra, “A
diffusion-inspired training strategy for singing voice extraction in the
waveform domain”, in Proc. of the 23rd Int. Society for Music Information
Retrieval Conf., Bengaluru, India, 2022.
oai:repositori.upf.edu:10230/541572022-09-23T11:46:04Zcom_10230_20994com_10230_3col_10230_20995
Bottlenecks and solutions for audio to score alignment research
Morsi, Alia
Serra, Xavier
Although audio to score alignment is a classic Music Information Retrieval problem, it has not been defined uniquely with the scope of musical scenarios representing its core. The absence of a unified vision makes it difficult to pinpoint its state-of-the-art and determine directions for improvement. To get past this bottleneck, it is necessary to consolidate datasets and evaluation methodologies to allow comprehensive benchmarking. In our review of prior work,we demonstrate the extent of variation in problem scope, datasets, and evaluation practices across audio to score alignment research. To circumvent the high cost of creating large-scale datasets with various instruments, styles, performance conditions, and musician proficiency from scratch, the research community could generate ground truth approximations from non-audio to score alignment datasets which include a temporal mapping between a music score and its corresponding audio. We show a methodology for adapting the Aligned Scores and Performances dataset, created originally for beat tracking and music transcription. We filter the dataset semi-automatically by applying a set of Dynamic Time Warping based Audio to Score Alignment methods using out-of-the-box Chroma and Constant-Q Transform extraction algorithms, suitable for the characteristics of the piano performances of the dataset. We use the results to discuss the limitations of the generated ground truths and data adaptation method. While the adapted dataset does not provide the necessary diversity for solving the initial problem, we conclude with ideas for expansion, and identify future directions for curating more comprehensive datasets through data adaptation, or synthesis.
2022-09-22
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/54157
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© A, Morsi, X. Serra. Licensed under a Creative Commons
Attribution 4.0 International License (CC BY 4.0). Attribution: A,
Morsi, X. Serra, “Bottlenecks and Solutions for Audio to Score Alignment
Research”, in Proc. of the 23rd Int. Society for Music Information
Retrieval Conf., Bengaluru, India, 2022.
oai:repositori.upf.edu:10230/541582022-09-27T12:51:26Zcom_10230_20994com_10230_3col_10230_20995
Music representation learning based on editorial metadata from discogs
Alonso-Jiménez, Pablo
Serra, Xavier
Bogdanov, Dmitry
This paper revisits the idea of music representation learning supervised by editorial metadata, contributing to the state of the art in two ways. First, we exploit the public editorial metadata available on Discogs, an extensive community-maintained music database containing information about artists, releases, and record labels. Second, we use a contrastive learning setup based on COLA, different from previous systems based on triplet loss. We train models targeting several associations derived from the metadata and experiment with stacked combinations of learned representations, evaluating them on standard music classification tasks. Additionally, we consider learning all the associations jointly in a multi-task setup. We show that it is possible to improve the performance of current self-supervised models by using inexpensive metadata commonly available in music collections, producing representations comparable to those learned on classification setups. We find that the resulting representations based on editorial metadata outperform a system trained with music style tags available in the same large-scale dataset, which motivates further research using this type of supervision. Additionally, we give insights on how to preprocess Discogs metadata to build training objectives and provide public pre-trained models.
2022-09-22
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/54158
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© P. Alonso, X. Serra, and D. Bogdanov. Licensed under
a Creative Commons Attribution 4.0 International License (CC BY 4.0).
Attribution: P. Alonso, X. Serra, and D. Bogdanov, “Music Representation
Learning Based on Editorial Metadata From Discogs”, in Proc. of
the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru,
India, 2022.
oai:repositori.upf.edu:10230/541812022-09-28T01:32:22Zcom_10230_20994com_10230_3col_10230_20995
MUSAV: a dataset of relative arousal-valence annotations for validation of audio models
Bogdanov, Dmitry
Lizarraga Seijas, Xavier
Alonso-Jiménez, Pablo
Serra, Xavier
We present MusAV, a new public benchmark dataset
for comparative validation of arousal and valence (AV) regression
models for audio-based music emotion recognition.
To gather the ground truth, we rely on relative judgments
instead of absolute values to simplify the manual
annotation process and improve its consistency. We build
MusAV by gathering comparative annotations of arousal
and valence on pairs of tracks, using track audio previews
and metadata from the Spotify API. The resulting dataset
contains 2,092 track previews covering 1,404 genres, with
pairwise relative AV judgments by 20 annotators and various
subsets of the ground truth based on different levels
of annotation agreement. We demonstrate the use of the
dataset in an example study evaluating nine models for AV
regression that we train based on state-of-the-art audio embeddings
and three existing datasets of absolute AV annotations.
The results on MusAV offer a view of the performance
of the models complementary to the metrics obtained
during training and provide insights into the impact
of the considered datasets and embeddings on the generalization
abilities of the models.
2022-09-27
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/54181
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© D. Bogdanov, X. Lizarraga-Seijas, P. Alonso-Jiménez,
and X. Serra. Licensed under a Creative Commons Attribution 4.0 International
License (CC BY 4.0). Attribution: D. Bogdanov, X. Lizarraga-
Seijas, P. Alonso-Jiménez, and X. Serra, “MusAV: A dataset of relative
arousal-valence annotations for validation of audio models”, in Proc. of
the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru,
India, 2022.
oai:repositori.upf.edu:10230/552492023-01-16T17:10:24Zcom_10230_20994com_10230_3col_10230_20995
Essentia API: a web API for music audio analysis
Correya, Albin Andrew
Bogdanov, Dmitry
Alonso Jiménez, Pablo
Serra, Xavier
We present Essentia API, a web API to access a collection of state-of-the-art music audio analysis and description algorithms based on Essentia, an open-source library
and machine learning (ML) models for audio and music analysis. We are developing it as part of a broader
project in which we explore strategies for the commercial viability of technologies developed at Music Technology Group (MTG) following open science and open
source practices, which involves finding licensing schemes
and building custom solutions. Currently, the API supports music auto-tagging and classification algorithms (for
genre, instrumentation, mood/emotion, danceability, approachability, and engagement), and algorithms for musical key, tempo, loudness, and many more. In the future, we
envision expanding it with new machine learning models
developed by the MTG and our collaborators to facilitate
their access for a broader community of users.
2023-01-10
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/55249
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© A. Correya, D. Bogdanov, P. Alonso-Jiménez, and X.
Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: A. Correya, D. Bogdanov,
P. Alonso-Jiménez, and X. Serra, “Essentia API: a web API for music audio analysis”, in Extended Abstracts for the Late-Breaking Demo Session
of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru,
India, 2022.
oai:repositori.upf.edu:10230/552852023-01-17T02:32:33Zcom_10230_20994com_10230_3col_10230_20995
Note level midi velocity estimation for piano performance
Kim, Hyon
Miron, Marius
Serra, Xavier
Piano is one of the most popular music instruments. During the piano performance, loudness is an important factor
for expressiveness, alongside tempo, changes in dynamics play with expectation, convey various emotions, and
render expressiveness. Due to the polyphonic characteristics and with the goal of better analysing the expressiveness of performance of piano with multiple notes playing
simultaneously, it is more useful to find loudness for each
note than looking at accumulated loudness for a single time
frame. Most of the research in this topic uses Non-negative
Matrix Factorization (NMF) techniques to find note level
loudness. In contrast, we propose to use Deep Neural Networks (DNNs) conditioned with score information to estimate the loudness based on MIDI velocity for each note
performed by piano players. To our best knowledge, this
is a novel research for note level MIDI velocity estimation
by a DNN model in end to end fashion having FiLM conditioning.
2023-01-16
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/55285
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© H. Kim, M. Miron, and X. Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: H. Kim, M. Miron, and X. Serra, “Note level MIDI velocity estimation for piano performance”, in Extended Abstracts for the LateBreaking Demo Session of the 23rd Int. Society for Music Information
Retrieval Conf., Bengaluru, India, 2022.
oai:repositori.upf.edu:10230/565642023-05-23T13:04:57Zcom_10230_20994com_10230_3col_10230_20995
Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity
Alonso-Jiménez, Pablo
Favory, Xavier
Foroughmand, Hadrien
Bourdalas, Grigoris
Serra, Xavier
Lidy, Thomas
Bogdanov, Dmitry
In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial
metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In
this paper, we extend this idea to using playlist data as a
source of music similarity information and investigate three
approaches to generate anchor and positive track pairs. We
evaluate these approaches by fine-tuning the pre-trained models for music multi-label classification tasks (genre, mood,
and instrument tagging) and music similarity. We find that
creating anchor and positive track pairs by relying on cooccurrences in playlists provides better music similarity and
competitive classification results compared to choosing tracks
from the same artist as in previous works. Additionally, our
best pre-training approach based on playlists provides superior classification performance for most datasets.
2023-04-25
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/56564
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GBI00
info:eu-repo/semantics/openAccess
oai:repositori.upf.edu:10230/565652023-04-26T01:30:17Zcom_10230_20994com_10230_3col_10230_20995
TAPE: An End-to-End Timbre-Aware Pitch Estimator
Tamer, Nazif C
Özer, Yigitcan
Müller, Meinard
Serra, Xavier
Pitch estimation of a target musical source within a multi-source polyphonic signal is of great interest for music performance analysis. One possible approach for extracting the pitch of a target source is to first perform source separation and then estimate the pitch of the separated track. However, as we will show, this typically leads to poor results. As an alternative to this approach, we introduce a timbre-aware pitch estimator (TAPE), which estimates the pitch of a target source in an end-to-end manner without the need for an explicit source separation step. Opposed to existing approaches that assume the predominance of a lead voice, our approach builds upon other cues that only rely on the timbral characteristics. Our results on real violin-piano duets show that, without any pre-processing step, TAPE outperforms the sequential procedure of source separation and pitch estimation under many settings, even if the target source is not predominant.
2023-04-25
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/56565
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
info:eu-repo/semantics/openAccess
oai:repositori.upf.edu:10230/568032023-05-13T01:30:36Zcom_10230_20994com_10230_3col_10230_20995
Score-Informed MIDI Velocity Estimation for Piano Performance by FiLM Conditioning
Kim, Hyon
Miron, Marius
Serra, Xavier
Piano is one of the most popular instruments among people that learn to play music. When playing the piano, the
level of loudness is crucial for expressing emotions as well
as manipulating tempo. These elements convey the expressiveness of music performance. Detecting the loudness of
each note could provide more valuable feedback for music students, helping to improve their performance dynamics. This can be achieved by visualizing the loudness levels not only for self-learning purposes but also for effective communication between teachers and students. Also,
given the polyphonic nature of piano music, which often
involves parallel melodic streams, determining the loudness of each note is more informative than analyzing the
cumulative loudness of a specific time frame.
This research proposes a method using Deep Neural Network (DNN) with score information to estimate note-level
MIDI velocity of piano performances from audio input. In
addition, when score information is available, we condition the DNN with score information using a Feature-wise
Linear Modulation (FiLM) layer. To the best of our knowledge, this is the first attempt to estimate the MIDI velocity
using a neural network in an end to end fashion. The model
proposed in this study achieved improved accuracy in both
MIDI velocity estimation and estimation error deviation, as
well as higher recall accuracy for note classification when
compared to the DNN model that did not use score information
2023-05-12
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/56803
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/3.0/
info:eu-repo/semantics/openAccess
© 2023 Hyon Kim et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution 3.0 Unported License, which
permits unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited.
oai:repositori.upf.edu:10230/575942023-07-18T01:30:33Zcom_10230_20994com_10230_3col_10230_20995
Com ho avaluem? Repensem la universitat: impacte de la intel·ligència artificial (IA) en l’aprenentatge
Hernández Leo, Davinia
Quin impacte té la Intel·ligència Artificial (IA) en l’avaluació. Com hem d’adaptar els mètodes d’avaluació per tal que segueixin essent efectius i, a més, posin a prova les competències relacionades amb l’ús de la IA
2023-07-17
info:eu-repo/semantics/conferenceObject
http://hdl.handle.net/10230/57594
eng
info:eu-repo/semantics/openAccess
Aquest article està subjecte a una llicència Creative Commons
oai:repositori.upf.edu:10230/577902023-09-01T01:30:13Zcom_10230_20994com_10230_3col_10230_20995
DiffVel: note-level MIDI velocity estimation for piano performance by a double conditioned diffusion model
Kim, Hyon
Serra, Xavier
In any piano performance, expressiveness is paramount for effectively
conveying the intent of the performer, and one of the most significant aspects
of expressiveness is the loudness at the individual key or note level. However,
accurately detecting note-level loudness poses a considerable technical challenge
due to the polyphonic nature of piano performances, wherein multiple notes are
played simultaneously, as well as the similarity of harmonic elements.
MIDI velocity is crucial for indicating loudness in piano notes. This study conducted
experiments for estimating a note-level MIDI velocity expanding the DiffRoll
model: the Diffusion Model for piano performance transcription. By adopting
double conditioning—audio and score information—and implementing noise removal
as a post-processing, our findings highlight the model’s potential in estimating
note level MIDI velocity.
2023-08-31
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/57790
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
This work is licensed under a Creative Commons Attribution 4.0 International License
(CC BY 4.0).
oai:repositori.upf.edu:10230/580232023-10-04T01:30:24Zcom_10230_20994com_10230_3col_10230_20995
Efficient supervised training of audio transformers for music representation learning
Alonso Jiménez, Pablo
Serra, Xavier
Bogdanov, Dmitry
In this work, we address music representation learning using
convolution-free transformers. We build on top of existing
spectrogram-based audio transformers such as AST
and train our models on a supervised task using patchout
training similar to PaSST. In contrast to previous works, we
study how specific design decisions affect downstream music
tagging tasks instead of focusing on the training task.
We assess the impact of initializing the models with different
pre-trained weights, using various input audio segment
lengths, using learned representations from different
blocks and tokens of the transformer for downstream
tasks, and applying patchout at inference to speed up feature
extraction. We find that 1) initializing the model
from ImageNet or AudioSet weights and using longer input
segments are beneficial both for the training and downstream
tasks, 2) the best representations for the considered
downstream tasks are located in the middle blocks of
the transformer, and 3) using patchout at inference allows
faster processing than our convolutional baselines while
maintaining superior performance. The resulting models,
MAEST, 1 are publicly available and obtain the best performance
among open models in music tagging tasks.
2023-10-03
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58023
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
© P. Alonso-Jiménez, X. Serra, and D. Bogdanov. Licensed
under a Creative Commons Attribution 4.0 International License (CC BY
4.0). Attribution: P. Alonso-Jiménez, X. Serra, and D. Bogdanov, “Efficient
Supervised Training of Audio Transformers for Music Representation
Learning”, in Proc. of the 24th Int. Society for Music Information
Retrieval Conf., Milan, Italy, 2023.
oai:repositori.upf.edu:10230/581082023-10-23T11:05:22Zcom_10230_20994com_10230_3col_10230_20995
Sounds out of pläce? Score-independent detection of conspicuous mistakes in piano performances
Morsi, Alia
Tatsumi, Kana
Maezawa, Akira
Fujishima, Takuya
Serra, Xavier
In piano performance, some mistakes stand out to listeners, whereas others may go unnoticed. Former research concluded that the salience of mistakes depended on factors including their contextual appropriateness and a listener’s degree of familiarity to what is being performed. A conspicuous error is considered to be an area where there is something obviously wrong with the performance, which a listener can detect regardless of their degree of knowledge of what is being performed. Analogously, this paper attempts to build a score-independent conspicuous error detector for standard piano repertoire of beginner to intermediate students. We gather three qualitatively different piano playing MIDI data: (1) 103 sight-reading sessions for beginning and intermediate adult pianists with formal music training, (2) 245 performances by presumably latebeginner to early-advanced pianists on a digital piano, and (3) 50 etude performances by an advanced pianist. The data was annotated at the regions considered to contain conspicuous mistakes. Then, we use a Temporal Convolutional Network to detect the sites of such mistakes from the piano roll. We investigate the use of two pre-training methods to overcome data scarcity: (1) synthetic data with procedurally-generated mistakes, and (2) training a part of the model as a piano roll auto-encoder. Experimental evaluation shows that the TCN performs at an F-measure of 0.78 without pretraining for sight-reading data, but the proposed pretraining steps improve the F-measure on performance and etude data, approaching the agreement between human raters on conspicuous error labels. Importantly, we report on the lessons learned from this pilot study, and what should be addressed to continue this research direction.
2023-10-20
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58108
eng
http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
© A. Morsi, K. Tatsumi, A. Maezawa, T. Fujishima, and X.
Serra. Licensed under a Creative Commons Attribution 4.0 International
License (CC BY 4.0).
oai:repositori.upf.edu:10230/581092023-10-24T12:06:38Zcom_10230_20994com_10230_3col_10230_20995
TRIAD: capturing harmonics with 3D convolutions
Perez, Miguel
Kirchhoff, Holger
Serra, Xavier
Thanks to advancements in deep learning (DL), automatic
music transcription (AMT) systems recently outperformed
previous ones fully based on manual feature design.
Many of these highly capable DL models, however,
are computationally expensive. Researchers are moving
towards smaller models capable of maintaining state-ofthe-
art (SOTA) results by embedding musical knowledge
in the network architecture. Existing approaches employ
convolutional blocks specifically designed to capture the
harmonic structure. These approaches, however, require
either large kernels or multiple kernels, with each kernel
aiming to capture a different harmonic. We present TriAD,
a convolutional block that achieves an unequally distanced
dilation over the frequency axis. This allows our method to
capture multiple harmonics with a single yet small kernel.
We compare TriAD with other methods of capturing harmonics,
and we observe that our approach maintains SOTA
results while reducing the number of parameters required.
We also conduct an ablation study showing that our proposed
method effectively relies on harmonic information.
2023-10-20
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58109
eng
https://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
© M. Perez, H. Kirchhoff, and X. Serra. Licensed under a
Creative Commons Attribution 4.0 International License (CC BY 4.0).
oai:repositori.upf.edu:10230/581212023-10-25T01:30:32Zcom_10230_20994com_10230_3col_10230_20995
High-resolution violin transcription using weak labels
Tamer, Nazif Can
Özer, Yigitcan
Müller, Meinard
Serra, Xavier
A descriptive transcription of a violin performance requires detecting not only the notes but also the fine-grained pitch variations, such as vibrato. Most existing deep learning methods for music transcription do not capture these variations and often need frame-level annotations, which are scarce for the violin. In this paper, we propose a novel method for high-resolution violin transcription that can leverage piece-level weak labels for training. Our conformer-based model works on the raw audio waveform and transcribes violin notes and their corresponding pitch deviations with 5.8 ms frame resolution and 10-cent frequency resolution. We demonstrate that our method (1) outperforms generic systems in the proxy tasks of violin transcription and pitch estimation, and (2) can automatically generate new training labels by aligning its feature representations with unseen scores. We share our model along with 34 hours of score-aligned solo violin performance dataset, notably including the 24 Paganini Caprices.
2023-10-24
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58121
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
© Nazif Can Tamer, Yigitcan Özer, Meinard Müller, Xavier
Serra. Licensed under a Creative Commons Attribution 4.0 International
License (CC BY 4.0).
oai:repositori.upf.edu:10230/581222023-10-25T01:30:26Zcom_10230_20994com_10230_3col_10230_20995
Predicting performance difficulty from piano sheet music images
Ramoneda, Pedro
Valero-Mas, Jose J.
Jeong, Dasaem
Serra, Xavier
Estimating the performance difficulty of a musical score
is crucial in music education for adequately designing the
learning curriculum of the students. Although the Music
Information Retrieval community has recently shown interest
in this task, existing approaches mainly use machinereadable
scores, leaving the broader case of sheet music
images unaddressed. Based on previous works involving
sheet music images, we use a mid-level representation,
bootleg score, describing notehead positions relative
to staff lines coupled with a transformer model. This architecture
is adapted to our task by introducing an encoding
scheme that reduces the encoded sequence length to oneeighth
of the original size. In terms of evaluation, we consider
five datasets—more than 7500 scores with up to 9 difficulty
levels—, two of them particularly compiled for this
work. The results obtained when pretraining the scheme
on the IMSLP corpus and fine-tuning it on the considered
datasets prove the proposal’s validity, achieving the bestperforming
model with a balanced accuracy of 40.34% and
a mean square error of 1.33. Finally, we provide access
to our code, data, and models for transparency and reproducibility.
2023-10-24
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58122
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
© P. Ramoneda, J. J. Valero-Mas, D. Jeong and X. Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
oai:repositori.upf.edu:10230/581232024-02-19T14:35:27Zcom_10230_20994com_10230_3col_10230_20995
TapTamDrum: a dataset for dualized drum patterns
Haki, Behzad
Kotowski, Błażej
Lee, Cheuk Lun Isaac
Jordà Puig, Sergi
Drummers spend extensive time practicing rudiments to develop technique, speed, coordination, and phrasing. These rudiments are often practiced on "silent" practice pads using only the hands. Additionally, many percussive instruments across cultures are played exclusively with the hands. Building on these concepts and inspired by Einstein's probably apocryphal quote, "Make everything as simple as possible, but not simpler," we hypothesize that a dual-voice reduction could serve as a natural and meaningful compressed representation of multi-voiced drum patterns. This representation would retain more information than its corresponding monotonic representation while maintaining relative simplicity for tasks such as rhythm analysis and generation. To validate this potential representation, we investigate whether experienced drummers can consistently represent and reproduce the rhythmic essence of a given drum pattern using only their two hands. We present TapTamDrum: a novel dataset of repeated dualizations from four experienced drummers, along with preliminary analysis and tools for further exploration of the data.
2023-10-24
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58123
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
© F. Author, S. Author, and T. Author. Licensed under a
Creative Commons Attribution 4.0 International License (CC BY 4.0).
oai:repositori.upf.edu:10230/581312023-10-26T01:30:23Zcom_10230_20994com_10230_3col_10230_20995
Activities using smart IoT planters in learning spaces: human-centred design of a dashboard
Hernández Leo, Davinia
Ferrer, Josep
Vujovic, Milica
Tabuenca, Bernardo
Ortiz-Beltran, Ariel
Greller, Wolfgang
Carrió, Mar
Moyano Claramunt, Elisabet
Education plays a transversal key role in the UN's Sustainable Development Goals agenda. Educating
young people on natural health and the interpretation of scientific evidence can contribute to
increased levels of informed social sensitivity towards our natural environment and awareness about
its effect on our planet and human wellbeing. To enable experienced-based environmental awareness
learning activities, this paper proposes the development of a dashboard that visualizes data captured
by sensors located in plants (smart IoT planters) available in learning spaces. The possibilities for
learning activities using smart planters can be diverse (plant care, ambient or emotional implications,
data analysis, etc.) and its design can influence the shape of desirable dashboard features. This paper
offers an answer for such a type of dashboard design based on a human-centred methodology which
involves stakeholders (experts and practitioners) in its co-design through guided hands-on
workshops. The results show insights related to what are the types of learning activities supported by
smart planters that can be especially valuable to educators and what design principles should be
considered in the creation of the supporting dashboard. Resulting representative proposals for
activities include plant monitoring, correlation of sensed data and observations, and collaborative
tasks. Key values perceived by participants include expected high levels of students' engagement,
critical thinking and familiarity with the scientific method. Design principles for a supporting
dashboard include the use of a traffic light metaphor or enabling data collection that could serve for
contrasting variables and observations at a moment in time and across time. The paper illustrates how
the results achieved can lead to the design of a human-centred dashboard for situated environmental
awareness education.
2023-10-25
info:eu-repo/semantics/report
http://hdl.handle.net/10230/58131
eng
info:eu-repo/grantAgreement/ES/2PE/PID2020-112584RB-C33
https://creativecommons.org/licenses/by-sa/4.0/
info:eu-repo/semantics/openAccess
Llicència CC Reconeixement-CompartirIgual 4.0 Internacional (CC BY-SA 4.0)
oai:repositori.upf.edu:10230/581862023-10-31T02:30:47Zcom_10230_20994com_10230_3col_10230_20995
Efficient notation assembly in optical music recognition
Penarrubia, Carlos
Garrido-Muñoz, Carlos
Valero-Mas, Jose J.
Calvo Zaragoza, Jorge
Optical Music Recognition (OMR) is the field of research
that studies how to computationally read music notation
from written documents. Thanks to recent advances in
computer vision and deep learning, there are successful approaches
that can locate the music-notation elements from
a given music score image. Once detected, these elements
must be related to each other to reconstruct the musical
notation itself, in the so-called notation assembly stage.
However, despite its relevance in the eventual success of
the OMR, this stage has been barely addressed in the literature.
This work presents a set of neural approaches to perform
this assembly stage. Taking into account the number
of possible syntactic relationships in a music score, we give
special importance to the efficiency of the process in order
to obtain useful models in practice. Our experiments, using
the MUSCIMA++ handwritten sheet music dataset, show
that the considered approaches are capable of outperforming
the existing state of the art in terms of efficiency with
limited (or no) performance degradation. We believe that
the conclusions of this work provide novel insights into
the notation assembly step, while indicating clues on how
to approach the previous stages of the OMR and improve
the overall performance.
2023-10-30
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58186
eng
https://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
© C. Penarrubia, C. Garrido-Munoz, J.J. Valero-Mas, and
J. Calvo-Zaragoza. Licensed under a Creative Commons Attribution 4.0
International License (CC BY 4.0).
oai:repositori.upf.edu:10230/581882023-10-31T02:30:49Zcom_10230_20994com_10230_3col_10230_20995
Carnatic singing voice separation using cold diffusion on training data with bleeding
Plaja-Roglans, Genís
Miron, Marius
Shankar, Adithi
Serra, Xavier
Supervised music source separation systems using deep
learning are trained by minimizing a loss function between
pairs of predicted separations and ground-truth isolated
sources. However, open datasets comprising isolated
sources are few, small, and restricted to a few music styles.
At the same time, multi-track datasets with source bleeding
are usually found larger in size, and are easier to compile.
In this work, we address the task of singing voice separation
when the ground-truth signals have bleeding and only
the target vocals and the corresponding mixture are available.
We train a cold diffusion model on the frequency
domain to iteratively transform a mixture into the corresponding
vocals with bleeding. Next, we build the final
separation masks by clustering spectrogram bins according
to their evolution along the transformation steps. We
test our approach on a Carnatic music scenario for which
solely datasets with bleeding exist, while current research
on this repertoire commonly uses source separation models
trained solely with Western commercial music. Our evaluation
on a Carnatic test set shows that our system improves
Spleeter on interference removal and it is competitive in
terms of signal distortion. Code is open sourced.
2023-10-30
info:eu-repo/semantics/preprint
http://hdl.handle.net/10230/58188
eng
info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
© G. Plaja-Roglans, M. Miron, A. Shankar, and X. Serra.
Licensed under a Creative Commons Attribution 4.0 International License
(CC BY 4.0).
oai:repositori.upf.edu:10230/583182023-11-21T02:30:31Zcom_10230_20994com_10230_3col_10230_20995
Completing audio drum loops with symbolic drum suggestions
Haki, Behzad
Pelinski, Teresa
Nieto, Marina
Jordà Puig, Sergi
Sampled drums can be used as an affordable way of creating
human-like drum tracks, or perhaps more interestingly,
can be used as a mean of experimentation with rhythm
and groove. Similarly, AI-based drum generation tools can
focus on creating human-like drum patterns, or alternatively,
focus on providing producers/musicians with means
of experimentation with rhythm. In this work, we aimed
to explore the latter approach. To this end, we present a
suite of Transformer-based models aimed at completing audio
drum loops with stylistically consistent symbolic drum
events. Our proposed models rely on a reduced spectral
representation of the drum loop, striking a balance between
a raw audio recording and an exact symbolic transcription.
Using a number of objective evaluations, we explore the validity
of our approach and identify several challenges that
need to be further studied in future iterations of this work.
Lastly, we provide a real-time VST plugin that allows musicians/
producers to utilize the models in real-time production
settings.
2023-11-20
info:eu-repo/semantics/conferenceObject
http://hdl.handle.net/10230/58318
eng
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Copyright remains with the author(s).
oai:repositori.upf.edu:10230/586572024-01-10T02:30:41Zcom_10230_20994com_10230_3col_10230_20995
Computers in education: how can we support the teachers?
Hernández Leo, Davinia
2024-01-09
info:eu-repo/semantics/report
Hernández-Leo D. Computers in Education: How can we support the teachers?. In: 31st International Conference on Computers on Education (ICCE), Dec. 7th 2023, Matsue, Japan.
http://hdl.handle.net/10230/58657
eng
https://creativecommons.org/licenses/by-sa/4.0/
info:eu-repo/semantics/openAccess
Llicència CC Reconeixement-CompartirIgual 4.0 Internacional (CC BY-SA 4.0)
oai:repositori.upf.edu:10230/592202024-02-27T15:38:00Zcom_10230_20994com_10230_3col_10230_20995
Leveraging pre-trained autoencoders for interpretable prototype learning of music audio
Alonso Jiménez, Pablo
Pepino, Leonardo
Batlle-Roca, Roser
Zinemanas, Pablo
Bogdanov, Dmitry
Serra, Xavier
Rocamora, Martín
We present PECMAE an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing representations with better generalization. APNet allows prototypes’ reconstruction to waveforms for interpretability relying on the nearest training data samples. In contrast, we explore using a diffusion decoder that allows reconstruction without such dependency. We evaluate our method on datasets for music instrument classification (Medley-Solos-DB) and genre recognition (GTZAN and a larger in-house dataset), the latter being a more challenging task not addressed with prototypical networks before. We find that the prototype-based models preserve most of the performance achieved with the autoencoder embeddings, while the sonification of prototypes benefits understanding the behavior of the classifier
2024
info:eu-repo/semantics/conferenceObject
Alonso-Jiménez P, Pepino L, Batlle-Roca R, Zinemanas P, Bogdanov D, Serra X, Rocamora M. Leveraging pre-trained autoencoders for interpretable prototype learning of music audio. Paper presented at: ICASSP Workshop on Explainable AI for Speech and Audio (XAI-SA); 2024 Apr 15; Seoul, Korea.
http://hdl.handle.net/10230/59220
eng
info:eu-repo/grantAgreement/ES/2PE/111403GB-I00
info:eu-repo/semantics/openAccess
All rights reserved
Institute of Electrical and Electronics Engineers (IEEE)