1. Hüwel, K. Adiloğlu and J. Bach, "Hearing aid research data set for acoustic environment
recognition," in in IEEE Int. Conf. Acoust, Speech Signal Process. (ICASSP), Barcelona, Spain,
2006.
2. D. Bogdanov, N. Wack, E. G'omez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. Zapata
and X. Serra, "ESSENTIA: an audio analysis library for music information retrieval," in Proc
ISMIR, Curitiba, Brazil, 2013.
3. M. Crocco, M. Cristani, A. Trucco and V. Murino, "Audio surveillance: A systematic review," ACM
Computing Surveys (CSUR), vol. 48, no. 4, pp. 1-46, 2016.
4. F. Shkurti, W. Chang, P. Henderson, M. Islam, J. Higuera, J. Li, T. Manderson, A. Xu, G. Dudek
and J. Sattar, "Underwater multi-robot convoying using visual tracking by detection," in in
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
5. C. Mydlarz, J. Salamon and J. P. Bello, "The implementation of lowcost urban acoustic monitoring
devices," Applied Acoustics, vol. 117, pp. 207-218, 2016.
6. M. Valenti, A. Diment, G. Parascandolo, S. Squartini and T. Virtanen, "DCASE2016 acoustic scene
classification us-ing convolutional neural networks," in Proc. Workshop DCASE, 2016, pp. 95-99.
7. H. Riazati Seresht and K. Mohammadi, "Environmental sound classification with low-complexity
convolutional neural network empowered by sparse salient region pooling," IEEE Access, vol. 11,
pp. 849-862, 2022.
8. Al-Hattab, Y. A., H. F. Zaki and A. A. Shafie, "Rethinking environmental sound classification using
convolutional neural networks: optimized parameter tuning of single feature extraction," Neural
Computing and Applications, vol. 33, no. 21, pp. 14495-14506, 2021.
9. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image
recognition," in Proc. ICLR, 2015.
10. G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, "Densely connected convolutional
networks," in Proc. CVPR, 2017, pp. 2261-2269.
11. K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition," in Proc. CVPR,
2016, pp. 770-778.
12. J. K. Piczak, "ESC: Dataset for environmental sound," in in Proc. 23rd ACM Int. Conf. Multimed.,
2015.
13. Kumar, M. Khadkevich and C. Fügen, "Knowledge transfer from weakly labeled audio using
convolutional neural network for sound events and scenes," in in IEEE Int. Conf. Acoust, Speech
Signal Process. (ICASSP), 2018.
14. L. I. Xuhong, Y. Grandvalet and F. Davoine, "Explicit inductive bias for transfer learning with
convolutional networks," in In Int. Conf. Mach. Learn., 2018.
15. J. Yosinski, J. Clune, Y. Bengio and H. Lipson, "How transferable are features in deep neural
networks?," in Proc. NIPS, 2014, pp. 3320-3328.
16. Eronen, V. Peltonen, J. Tuomi, A. Klapuri, S. Fagerlund, T. Sorsa, G. Lorho and J. Huopaniemi,
"Audio-based context recognition," IEEE Trans. on Audio, Speech, and Lang. Process., vol. 14, no.
1, pp. 321-329, 2005.
17. K. Lee and D. Ellis, "Audio-based semantic concept classification for consumer video," IEEE Trans.
on Audio, Speech, and Lang. Process., vol. 18, no. 6, pp. 1406-1416, 2009.
18. I. McLoughlin, "Line spectral pairs," Signal Proces., vol. 88, no. 3, pp. 448-467, 2008
19. Temko, E. Monte and C. Nadeu, "Comparison of sequence discriminant support vector machines
for acoustic event classification," in In IEEE Int. Conf. Acoust, Speech Signal Process. (ICASSP),
2006.
20. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen,
T. Sainath and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition:
The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97,
2012.
21. J. Piczak, "Environmental sound classification with convolutional neural networks," in In 2015
IEEE 25th Int. Workshop Mach. Learn. Signal Process. (MLSP), 2015.
22. Y. Tokozume and T. Harada, "Learning environmental sounds with end-to-end convolutional neural
network," in In IEEE Int. Conf. Acoust, Speech Signal Process. (ICASSP), 2017.
23. X. Li, V. Chebiyyam and K. Kirchhoff, "Multi-stream network with temporal attention for
environmental sound classification," arXiv:1901.08608, 2019.
24. J. Sharma, O. C. Granmo and M. Goodwin, "Environment Sound Classification Using Multiple
Feature Channels and Attention Based Deep Convolutional Neural Network," in In Interspeech,
2020.
25. T. Qiao, S. Zhang, S. Cao and S. Xu, "High Accurate Environmental Sound Classification: Sub-
Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism," Sensors, vol. 21,
no. 16, 2021.
26. W. Mu, B. Yin, X. Huang, J. Xu and Z. Du, "Environmental sound classification using temporalfrequency
attention based convolutional neural network," Scientific Reports, vol. 11, no. 1, 2021.
27. H. Wang, Y. Zou, D. Chong and W. Wang, "Environmental sound classification with parallel
temporal-spectral attention," arXiv preprint arXiv:1912.06808, 2020.
28. J. Salamon and J. P. Bello, "Deep convolutional neural networks and data augmentation for
environmental sound classification," IEEE Signal Process. Letters, vol. 24, no. 3, pp. 279-283, 2017.
29. Z. Mushtaq and S. F. Su, "Environmental sound classification using a regularized deep
convolutional neural network with data augmentation," Applied Acoustics, vol. 167, 2020.
30. Madhu and S. Kumaraswamy, "Data augmentation using generative adversarial network for
environmental sound classification," in In 2019 27th European Signal Processing Conference
(EUSIPCO), 2019.
31. S. Mun, S. Shon, W. Kim, D. K. Han and H. Ko, "Deep neural network based learning and
transferring mid-level audio features for acoustic scene classification," in In IEEE Int. Conf. Acoust,
Speech Signal Process. (ICASSP), 2017.
32. Z. Mushtaq, S. F. Su and Q. V. Tran, "Spectral images based environmental sound classification
using CNN with meaningful data augmentation," Applied Acoustics, vol. 172, 2021.
33. J. Gemmeke, D. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. Moore, M. Plakal and M. Ritter,
"Audio set: An ontology and human-labeled dataset for audio events," in In IEEE Int. Conf. Acoust,
Speech Signal Process. (ICASSP), 2017.
34. Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang and M. Plumbley, "Panns: Large-scale pretrained
audio neural networks for audio pattern recognition," IEEE/ACM Transactions on Audio, Speech,
and Language Processing, vol. 28, pp. 2880-2894, 2020.
35. Y. K. Wang and K. C. Fan, "Applying genetic algorithms on pattern recognition: An analysis and
survey," in In Proc. 13th Int. Conf. Pattern Recognit., 1996.
36. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, "Dropout: a simple
way to prevent neural networks from overfitting," J. Mach. Lerning Res., vol. 15, no. 1, pp. 1929-
1958, 2014.
37. Y. Sun, B. Xue, M. Zhang and G. G. Yen, "Completely automated CNN architecture design based
on blocks," IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1242-1254, 2020.
38. Y. Sun, G. G. Yen and Z. Yi, "Evolving unsupervised deep neural networks for learning meaningful
representations," IEEE Trans. Evol. Comput., vol. 23, no. 1, pp. 89-103, 2018.
39. S. Katoch, S. S. Chauhan and V. Kumar, "A review on genetic algorithm: past, present, and future,"
Multimedia Tools and Applications, vol. 80, no. 5, pp. 8091-8126, 2021.
40. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj and T. Virtanen, "DCASE
2017 challenge setup: Tasks, datasets and baseline system," in In Workshop on Detection and
Classification of Acoustic Scenes and Events (DCASE), 2018.
41. S. Hershey, S. Chaudhuri, D. Ellis, J. Gemmeke, A. Jansen, R. Moore, M. Plakal, D. Platt, R.
Saurous, B. Seybold and M. Slaney, "CNN architectures for large-scale audio classification," in In
IEEE Int. Conf. Acoust, Speech Signal Process. (ICASSP), 2017.
42. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M.
Isard and M. Kudlur, "Tensorflow: A system for large-scale machine learning," in in Proc. 12th
USENIX Symp. Oper. Syst. Des. Implement., 2016, pp. 265-283.
43. Y. Aytar, C. Vondrick and A. Torralba, "Soundnet: Learning sound representations from unlabeled
video," in Proc. NIPS, 2016, pp. 892-900.
44. S. Pan and Q. Yang, "A survey on transfer learning," IEEE Trans Knowl. Data Eng., vol. 22, no. 10,
pp. 1345-1359, 2010.
45. Z. Mushtaq and S. F. Su, "Efficient classification of environmental sounds through multiple features
aggregation and data enhancement techniques for spectrogram images," Symmetry, vol. 12, no. 11,
2020.