Barriers to Progress in Speaker Identification with Comments on the Trayvon Martin Case

Harry Hollien

Abstract


Linguistics and phonetics overlap in many areas. The essay to follow reviews some of the problems experienced by phoneticians in one of these regions. It may provide some insight for linguists when they are confronted by barriers in their own field. The present example involves individuals who are attempting to identify speakers from voice analysis. The fundamental challenge they face is, of course, caused by the thousands of variables associated with that task. Included here are differences among speakers’ gender, age, size, physiology, language, dialect, psychological/health states, background/education, reason for speaking, situation, environment, configuration of the acoustic channel -- plus many others. Many formal assessment procedures -- both aural-perceptual ones conducted by humans or machine/computer based systems -- have been proposed and/or used for the cited analyses. Unfortunately, however, few have enjoyed particularly high levels of success. Worse yet, reasonable progress has suffered from external impedances; the report to follow will outline some of them. Among the problems considered are: 1) competition (verification vs. identification, from voiceprints), 2) concept disputes 3) the continued undervaluation of relevant evidence and 4) markedly dissimilar philosophies of professionals from different disciplines. A response in the form of a short review of the data and concepts which clearly support the possibility of robust speaker identification is presented. Also included are suggestions as to how to enhance the effectiveness of disciplines such as ours.


Keywords


speaker identification, automated speech processing, expert witnesses, Trayvon Martin

Full Text:

PDF

References


REFERENCES

Note: This article reviews so many events and experiments -- those occurring over such a long

period of time -- that over 300 references would be needed to fully document them.

However, in order to reduce their number to a manageable level, certain steps were taken.

First, the well-known “rule of three” was imposed. In addition, a reference was included

only when 1) identification of an event or project was absolutely necessary or 2) further

explanation of a concept was considered desirable. Finally, when any of many dozens of

references would be relevant, only the best or most important was included.

Adcock, J.M., (Editor) Investigative Sciencs Journal, Contact: jmadcock@jma-forensics.org or www.investigativesciencejournal.org

Agnitio, (2009) Batvox 3.0, Basic User Manual, Madrid, Spain

Alexander, A., Botti, F., Dessimoz, D. and Drygajlo, A. (2005) The Effect of Mismatched Recording Conditions on Human and Automatic Speaker Recognition in Forensic Application, Forensic Sci. Internat., S95-99.

Atal, B.S. (1972) Automatic Speaker Recognition Based on Pitch Contours, J. Acoust. Soc. Amer., 52: 1678-7697.

Beigi, H. (2011) Fundamentals of Speaker Recognition, Secausus, NJ, Springer.

Bower, B. (2013) Closed Thinking, Science News, 183: 26-29.

Bricker, P. and Pruzanzky, S. (1966) Effects of Stimulus Content and Duration on Talker Identification, J. Acoust. Soc. Amer., 40: 1441-1450.

Bronkhorst, A.W. (2000) The Cocktail Party Phenomenon: A Review of Research on Speech Intelligibility in Multiple-talker Conditions, Acustica, 86: 117-128.

Campbell, J., Shen, W., Campbell, W. Schwartz, R., Bonastre, J.F. and Matrouf, D. (2009) Forensic Speaker Recognition, IEEE Signal Processing Mag., March: 95-103.

Daubert vs. Merrel Dow Pharms Inc., (1992) 509 U.S. 579, 113S. CT 2786.

DeJong, G. (1998) Earwitness Characteristics and Speaker Identification Accuracy, PhD dissertation, Univ. of Florida.

Fiedler, K., Kutzner, F. and Krueger, J. (2012) The Long Way from -Error Control to Validity Proper, Perspect. Psychol. Sci., 7: 661-669.

Florida vs. Zimmerman, (2013) No. 1712F4573 Circuit Court, Seminole County, Florida.

Gelfer, M.P., Massey K.P., and Hollien, H. (1989) The Effects of Sample Duration and

Timing of Speaker Identification Accuracy by Means of Long-term Spectra, J. Phonet; 17: 327-338

Gigeenzer, G. (2010) Personal Reflections on Theory and Psychology, Theory and Psychology, 20: 733-743.

Hautamäki, V., Kinnunen, T., Nosratighods, M., Lee, K.A., Ma, B. and Li, H. (2010) Approaching Human Listener Accuracy with Modern Speaker Verification, In INTERSPEECH-2010, 1473-1476.

Hecker, M.H.L. (1971) Speaker Recognition: An Interpretive Survey of the Literature, ASHA, Monograph #16, Washington, D.C.

Hollien, H. (1990) Acoustics of Crime, New York, Plenum Press.

Hollien, H. (2002) Forensic Voice Identification, London, Academic Press Forensics.

Hollien, H. (2012) On Earwitness Lineups, Investigat. Sci. J., 4: 1-17.

Hollien, H. and Harnsberger, J. (2010) Speaker Identification: The Case for Speech Vector Analysis, J. Acoust. Soc. Amer., 128: 2394A (and submitted)

Hollien, H. and Harnsberger, J. (2013) Attempted Speaker Identification: Florida vs. Zimmerman (1712F4573), submitted to the Office of the State Attorney, Fourth Judicial Circuit, Jacksonville, FL.

Hollien, H. and Hollien, P.A. (1995) Improving Aural-Perceptual Speaker Identification Techniques, Stud. Forensic Phonet., 64: 87-97.

Hollien, H. and Jiang, M. (1998) The Challenge of Effective Speaker Identification, RLA2C, Avignon, France, 1: 2-9.

Hollien, H. and Majewski, W. (1977) Speaker Identification Using Long-term Spectra Under Normal and Distorted Speech Conditions, J. Acoust. Soc. Amer., 62: 975-980.

Hollien, H. and Majewski, W. (2009) Unintended Consequences: Due to Lack of Standards for Speaker Identification and Other Forensic Procedures, Proceed. 16th Internat. Congr. Sound/Vib., Krakow, Poland, July 866: 1-6.

Hollien, H., Hicks, J.W. and Oliver, L.H. (1990) A Semi-Automatic System for Speaker Identification, Neue Tend. Angewandten Phon. III (V.A. Borowski and J.P. Koester, Eds.), Hamburg, Helmut Buske Verlag, 62: 89-106.

Hollien, H., Majewski, W. and Doherty, E.T. (1982) Perceptual Identification of Voices Under Normal, Stress and Disguise Speaker Conditions, J. Phonetics, 10: 139-148.

Jacewicz, E., Fox, R.A., and Wei, L. (2010) Between-speaker and Within-speaker Variation in speech tempo of American English, J. Acoust. Soc. Am., 128: 839-850

Jiang, M. (1995) Experiments on a Speaker Identification System (PhD dissertation, Univ. of Florida)

Jiang, M. (1996) Fundamental Frequency Vector for a Speaker Identification System, Forensic Ling., 3: 95-106

Johnson, C.C., Hollien, H. and Hicks, J.W. (1984) Speaker Identification Utilizing Selected Temporal Speech Features, J. Phonet., 12: 319-327.

Kersta, L. (1962) Voiceprint Identification, Nature, 196: 1253-1257

Koester, J.P. (1987) Performance of Experts and Naïve Listeners in Auditory Speaker Recognition, in German, Festschrift fiir H. Wangler (R. Weiss, Ed.) Hamburg: Buske, 171-180.

Kraus, N. and Nicol, T. (2010) The Musician’s Auditory World, Acoustics Today, 3: 15-27.

Kraus, N., McGee, T., Carrell, T.D. and Sharma, A. (1995) Neurophysiologic Bases of Speech Discrimination, Ear and Hear., 16: 19-37.

Kraus, N., Skoe, E., Parberry-Clarke, A. and Ashley, R. (2009) Experience-induced Malleability in Neural Encoding of Pitch, Timbre and Timing: Implications for Language and Music, Annals New York Acad. Sci., Neurosci. and Music III, 1169: 543-557.

Krishnan, A, Xu, Y.S., Gandour, J. and Cariani, P. (2005) Encoding of Pitch in the Human Brainstem is Sensitive to Language Experience, Cognitive Brain Res., 25: 161-168.

Künzel, H. (2013) Automatic Speaker Recognition with Cross-language Speech Material, Journal of Speech, Lang. and Law, Vol. 20-1: 21-44.

LaRiviere, C.L. (1975) Contributions of Fundamental Frequency and Formant Frequencies to Speaker Identification, Phonetica, 31: 185-197.

Lea, W. (1981) Voice Analysis on Trial, Springfield Il, Thomas, Charles C.

Mack vs. State of Florida, 54, Fla. 55 44 50 706 (1907) citing 5, Howell’s State Trials 1186

McGehee, F. (1937) The Reliability of the Identification of the Human Voice, J. Gen. Psychol., 17: 249-271.

Michel, W. (2008) The Toothbrush Problem, The Observer, Assn. Psychol. Sci., 21: 1-3.

Morrison, G.S. (2002) Liklihood-ratio Forensic Voice Comparison Using Parametric Representations of the Formant Trajectories of Diphthongs, J. Acoust. Soc. Amer., 125: 2387-2397.

Morrison, G.S. (2006) Vowel Inherent Spectral Change in Forensic Voice Comparison, J. Acoust. Soc. Am., 125: 2695A

Orchard, T., and Yarmey, A. (1995) The Effects of Whispers, Voice-sample Duration and Voice Distinctives on Criminal Speaker Identification, Appt. Cogn. Psychol., 9: 249-260

Papcun, G., Kreiman, J. and Davis, A. (1989) Long-term Memory for Unfamiliar Voices, J. Acoust. Soc. Amer., 85: 913-925.

Pollack, I., Pickett, J.M. and Sumby, W.H. (1954) On the Identification of Speakers by Voice, J. Acoust. Soc. Amer., 26: 403-412.

Reynolds, D.A. (1995) Speaker Identification and Verification Using Gaussian Mixture Speaker Models, Speech Comm., 17: 91-108.

Sambur, M.R. (1976) Speaker Recognition Using Orthogonal Linear Prediction, IEEE Trans., ASSP, 24: 283-287.

Schmidt-Nielson, A and Crystal, T.H. (2000) Speaker Verification by Human Listeners: Experiments Comparing Human and Machine Performance Using the NIST Speaker Evaluation Data, Digit. Sign. Proc., 10: 249-266.

Schuartz, M.F. (1986) Identification of Speaker Sex rom Isolated Voice Fricatives, J. Acoust. Soc. Am., 43: 1178-1179

Shirt, M. (1984) An Auditory Speaker Recognition Experiment, Proceed., Conf. Police Appli. Speech, Tape Record. Evidence, London, Instit. Acoust., 71-74.

Siegfried, T. (2010) Odds Are, It’s Wrong, Science News, 177: 26-35.

Simons, J., Nelson, L. and Simonsohn, U. (2011) Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Psychol. Sci., 22: 1359-1366.

Stevens, K.N. (1971) Sources of Inter- and Intra-speaker Variability in the Acoustic Properties of Speech Sounds, Proceed. 7th Int. Cong. Phonetic Sci., Montreal, 206-232.

Strait, D., Skoe, E., Kraus, N. and Ashley, R. (2009) Musical Experience and Neural Efficiency: Effects of Training on Subcortical Processing of Vocal Expressions of Emotion, Europ. J. Neurosci., 29: 661-668.

Tsai, W.H. and Wang, H.M. (2006) Speech Utterance Clustering Based on the Maximization of Within-clustering Homogeneity of Speaker Voice Characteristics, J. Acoust. Soc. Amer., 120: 1631-1645.

Wong, P., Skoe , E., Russo, N., Dees, T. and Kraus, N. (2007) Musical Experience Shapes Human Brainstem Encoding of Linguistic Pitch Patterns, Nature Neurosci, 10: 420-422

Yarmey, A.D. (1995) Earwitness Speaker Identification, Psychol. Public Policy Law, 1: 792-816.




DOI: https://doi.org/10.5195/lesli.2013.3

Refbacks

  • There are currently no refbacks.