Abstract
This paper introduces acoustic events detection system capable of processing continuous input audio stream in order to detect potentially dangerous acoustic events. The system is representing a light, easy extendable, log-term running and complete solution to acoustic event detection. The system is based on its own approach to detection and classification of acoustic events using modified Viterbi decoding process using in combination with Weighted Finite-State Transducers (WFSTs) to support extensibility and acoustic modeling based on Hidden Markov Models (HMMs). Thesystem is completely programmed in C++ language and was designed to be self sufficient and to not require any additional dependencies. Additionally also a signal preprocessing part for feature extraction of Mel-Frequency Cepstral Coefficient(MFCC), Frequency Bank Coefficient (FBANK) and Mel-Spectral Coefficient (MELSPEC) is included. For robustness increase the system contains Cepstral Mean Normalization (CMN) and our proposed removal of basic coefficients from feature vector.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lopatka, K., Kotus, J., Czyzewski, A.: Application of vector sensors to acoustic surveillance of a public interior space. Archives of Acoustics 36, 851–860 (2011)
Lopatka, K., Czyzewski, A.: Acceleration of decision making in sound event recognition employing supercomputing cluster. Information Sciences (2013) (article in press)
Lojka, M., Juhár, J.: Fast construction of speech recognition network for Slovak language. Journal of Electrical and Electronics Engineering 3(1), 111–114 (2010)
Lee, A., Kawahara, T.: Recent Development of Open-Source Speech Recognition Engine Julius. In: Proc. of the Asia-Pacific Signal and Information Processing Association, Annual Summit and Conference, APSIPA ASC 2009, Sapporo, Japan, pp. 131–137 (2009)
Lamere, P., Kwok, P., Gouvea, E., Raj, B., Singh, R., Walker, W., Warmuth, M., Wolf, P.: The CMU SPHINX-4 speech recognition system. In: IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2003), Hong Kong, pp. 2–5 (2003)
Schliep, A., Georgi, B., Rungsarityotin, W., Costa, I., Schonhuth, A.: The general Hidden Markov Model library: Analyzing systems with unobservable states. In: Proceedings of the Heinz-Billing-Price, pp. 121–135 (2004)
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor. In: Proc. ACM Multimedia (MM), Barcelona, Spain, pp. 835–838. ACM (2013)
Pleva, M., Lojka, M., Juhar, J.: Modified Viterbi decoder for long-term audio events monitoring. Journal of Electrical and Electronics Engineering 5(1), 195–198 (2012)
Pleva, M., Lojka, M., Juhar, J., Vozarikova, E.: Evaluating the modified Viterbi decoder for long-term audio events monitoring task. In: Proceedings Elmar - International Symposium Electronics in Marine, pp. 179–182 (2012)
Lojka, M., Pleva, M., Juhar, J., Kiktova, E.: Modification of widely used feature vectors for real-time acoustic events detection. In: Proceedings Elmar - International Symposium Electronics in Marine, pp. 199–202 (2013)
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., The HTK Book Version 3.4. Cambridge University Press (2006)
Alam, M.J., Ouellet, P., Kenny, P., O’Shaughnessy, D.: Comparative evaluation of feature normalization techniques for speaker verification. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds.) NOLISP 2011. LNCS, vol. 7015, pp. 246–253. Springer, Heidelberg (2011)
Vozáriková, E., Juhár, J., Čižmár, A.: Acoustic events detection using MFCC and MPEG-7 descriptors. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2011. CCIS, vol. 149, pp. 191–197. Springer, Heidelberg (2011)
Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite-state transducers. In: Springer Handbook of Speech Processing, pp. 1–31 (2008)
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)
Dixon, P.R., Hori, C., Kashioka, H.: A comparison of dynamic WFST decoding approaches. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4209–4212 (2012)
Pleva, M., Vozarikova, E., Dobos, L., Cizmar, A.: The joint database of audio events and backgrounds for monitoring of urban areas. Journal of Electrical and Electronics Engineering 4(1), 185–188 (2011)
Kiktova, E., Lojka, M., Pleva, M., Juhar, J., Cizmar, A.: Comparison of different feature types for acoustic event detection system. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2013. CCIS, vol. 368, pp. 288–297. Springer, Heidelberg (2013)
Sattar, F., Driessen, P.F., Page, W.H.: Automatic event detection for noisy hydrophone data using relevance features. In: Proceedings of the IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing, pp. 383–388 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Lojka, M., Pleva, M., Kiktová, E., Juhár, J., Čižmár, A. (2014). EAR-TUKE: The Acoustic Event Detection System. In: Dziech, A., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2014. Communications in Computer and Information Science, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-319-07569-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-07569-3_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07568-6
Online ISBN: 978-3-319-07569-3
eBook Packages: Computer ScienceComputer Science (R0)