Detecting Deception by Analyzing Written Statements in Korean

This paper delves into the effect of SCAN and its cross-linguistic applicability by analyzing written statements in Korean. For this research, we conducted an experiment in which truth tellers were asked to write a true statement about a staged event and liars a fabricated one about the same event. We analyzed these two types of written statements using the criteria of SCAN. The results (accuracy rate, 81.6%) indicate that SCAN is effective in detecting deception despite the low internal consistency level among coders (Cronbach’s alpha level, 0.577). It was also shown that the SCAN criteria are not universally applicable across languages as the mode of using pronouns in Korean yields no significant difference between truthful and deceptive statements.


Introduction
Deception detection is rarely a simple act because it involves a variety of verbal and nonverbal deception indicators.Most research has focused on nonverbal indicators such as facial expression, body movement, vocal tone, heartbeat rate, blood pressure, transpiration, etc. Polygraph has been one of the best known representative devices using nonverbal indicators for deception detection.Other major devices include audio-and/or video-taped interviews during which the experimenter observes subjects' nonverbal behavior (Broadhurst & Cheng, 2005;Warren et al., 2009;Kolkman, 2012).These approaches are, however, largely discredited due to false positives as well as false negatives, which arise from stress and nervousness during the test (Kraut, 1980;DePaulo et al., 1985;Vrij, 2000).Ekman (1992: 81) notes that liars tend to be most careful about their choice of 'words'; this is because words are the richest and most differentiated way to communicate.Furthermore, if words are written, their statement is expected to severely depend on lexicalization and complex syntactic structures in order to maintain 'cohesion' and 'coherence' (Chafe, 1982;Gumperz et al., 1984), which will in turn produce more linguistic leakage to deception.It follows that written statements can serve as a good tool for determining strong indicators of deception, since the writing task significantly reduces stress and nervousness caused by a face-to-face interview and substantially enhances the reliability of deception cues.
This paper aims at detecting deception by analyzing subjects' written statements.For deception detection, we will use SCAN put forward by Sapir (1987), which is embodied by 13 analytical criteria.We have chosen SCAN as a detection tool simply because it has been most often used for deception detection in Korean (Lee, 2007;Kim, 2010;Lee et al., 2010;Kwon, 2012;Lee, 2013).At the outset of this paper, we address the following two speculative issues; (i) whether SCAN is effective in detecting deception, and (ii) whether SCAN is applicable across languages including Korean.Our discussion will be developed mainly focused on these issues.Sapir (1987) in order to assist an investigator to detect instances of potential deception within a written statement.As opposed to interviews and interrogations done between a suspect and an investigator, the investigator using the SCAN technique faces a written statement made by the suspect and reads between the lines of the statement to find out hidden deceptive intentions.SCAN consists of 13 working criteria by which the investigator can suspect the statement of being deceptive.Sapir (1987) argues that if the writer is not consistent in his use of vocabulary this is indicative of being deceptive.Changes in vocabulary may indicate that there has been some modification in his life or that his statement has not been derived from his real experience.We can encounter the most frequent vocabulary changes in references to people, transportation, weapons, communication, etc.A shift in reference from wife to this woman, for example, may reveal that there has occurred a change in psychological distance between the writer and his wife after an event.A change in reference to weapons also reveals another instance of psychological distance.If a suspect has utilized a knife as a kitchen utensil, for instance, he refers to it as knife in a straightforward manner.If he has used a knife as a deadly weapon, however, he is often observed to refer to it as this, that, or the thing.

Placing of Emotions within the Statement
According to Sapir (1987), if the writer expresses and/or places his emotion inappropriately within a statement, the statement could be deceptive.If the writer's family members are murdered in an accident, for example, he is supposed to express his heart breaking emotion.There was a mother whose daughter allegedly died from falling off a cliff.The mother is supposed to feel sad at the death of her daughter.However, she failed to express this sad feeling and said indifferently, "The doctor said that my daughter had died." The point where emotion is expressed within a statement also provides a critical clue.Smith (2001) provides a relevant example, in which a man reports his injuries to his insurance company to get money.The injuries allegedly took place as he tried to place a box on a high shelf.His report, however, revealed itself to be deceitful due to the inappropriate placement of emotion, "... as I was trying to climb up the boxes with that heavy box I got very nervous.I was afraid I might fall because the boxes were beginning to move some ...".The italicized references to emotion are taken to be misplaced within this statement because it was made before the description of his fall which caused the injuries.Here we encounter an unnatural cause-and-effect run of a sentence.That is, the feeling of nervousness and fear naturally ensues from the fall.Analogously, Lee (2013) also argues that the statement, "Somebody was searching my living room.So I was so scared", sounds more natural than the statement, "I was so scared.For somebody was searching my living room."

Improper Use of Pronouns
According to Sapir (1987), pronouns represent responsibility, possession, and relationship.The first person singular pronoun 'I' receives a particular attention from investigators.This is because it is the most frequently used personal pronoun by the writer.Investigators have observed that a truthful person writes a statement using the pronoun 'I'.On the contrary, a deceptive person is often observed to avoid its use within a statement.The lack of this pronoun may indicate that the writer avoids telling the truth.Adams (1996) argues that the deceptive writer, by avoiding to use 'I', is trying not to be totally committed to the facts, stated as follows.

I got up at 7:00 when my alarm went off. I took a shower and got dressed. I decided to go out for breakfast. I went to the McDonald's on the corner. Met a man who lives nearby. Talked with him for a few minutes. I finished breakfast and drove to work.
As shown above, the pronoun 'I' is missing from the two sentences in italics.According to the SCAN technique, this missing pronoun strongly indicates that the writer is lying about having met the man and talked with him.Sapir (1987) claims that a deceptive writer tends to lack conviction about the incident.The lack of conviction is typically represented by the use of hedge words such as "I think", "kind of", "I believe", "I don't remember", etc. Adams (1996) calls these phrases 'qualifiers' and says that they serve to temper the action which is about to be described, thereby discounting the message before it is transmitted.Lee (2013) cautions against the use of this criterion in every case.He argues that there can be a case where the writer cannot genuinely remember the incident which took place long ago.Then this incident goes beyond the range of his memory, and we could encounter such phrases even in a truthful statement.

No Denial of Allegations
According to Sapir (1987), a truthful person clearly and directly denies allegations surrounding him.A typical denial would be to say, "I did not do it."On the contrary, a deceptive person is often evasive in responding to allegations.It follows that his statement is highly vague or ambiguous.There was a man suspected of robbing a woman of money.Coincidently, he possessed the same amount of money as she had at the time of robbery.He denied the allegation of robbery but said ambiguously, "I have only 18 dollars in my pocket" (Lee, 2013: 316).Sapir (1987) claims that a deceptive person tends to describe an event out of sequence, deviating from its logical progression.Adams (1996) also confirms this claim by saying that a truthful person recounts the event concisely and chronologically.As the deceptive person is involved in the criminal event, he is obliged to hide his involvement.To this end, he skips relevant information and/or adds irrelevant information, which Adams (1996) calls 'extraneous' information.

Social Introduction
According to Sapir (1987), a deceptive person often fails to introduce the people who are involved in the event.He describes them in a vague way by using pronouns such as 'we' or 'they' without referring to their antecedent.When he uses proper names, he fails to introduce them in relation to him or among other people described.For example, a proper name along with an appositive would indicate a clear social introduction as in "John, my cousin, was there."The deceptive person, however, often deletes the appositive phrase my cousin in his statement.Lee (2013) observes that the order of social introduction deserves to be noticed by investigators.According to him, the order of social introduction is usually determined by degrees of intimacy.It follows that the writer tends to list people in a descending order of intimacy; he starts to state the most intimate person first and then proceed to the rest of the people, finishing with the least intimate person.

Spontaneous Corrections
According to Sapir (1987), spontaneous correction can be an indicator of deception.Correction can be made with crossings out, words changed, deletion, insertion, etc. Lee (2013) argues that the act of spontaneous correction indicates the writer's psychological load; he feels hard pressed to hide some important information.Spontaneous correction also indicates the change of the writer's original plan to tell the truth.When we encounter correction, therefore, we should read between the lines to figure out what the writer intends to say at the point of correction.Sapir (1987) claims that a truthful statement is often well-balanced among introduction, body, and conclusion in its structure.According to him, 20% of the statement should be reserved to introduce the event, 50% should be allotted for the body of the event, and finally 30% should discuss the conclusion to the event.The body of a deceptive statement is typically short-less than one-third of the whole statement.This indicates that the writer is trying to hide something that happened in the main event, consequently reducing the body part of the event.

Structure of the Statement
Tense Change Sapir (1987) argues that a truthful statement is written typically in the first person singular past tense.According to him, a change to the past tense is indicative of deception.The case of Susan Smith is a well-known example.Susan Smith, who was the mother of two missing children, tearfully told reporters, "My children wanted me.They needed me.And now I can't help them."The shift from the past to the present tense in this statement provided a decisive clue to investigators, who subsequently arrested her for murdering the children.The use of the past tense indicates that she knew about her children's death.If she had not been sure of their death, she should have used the present verb tense.Sapir (1987) distinguishes between objective and subjective time in order to describe time as a deceptive criterion.Objective time is actual time off the clock when an event takes place, represented as '8 o'clock', 'at noon', 'in the morning', etc. Subjective time, on the other hand, is measured by the pace of the statement.The amount of subjective time is measured by the number of lines in an actual statement.Hyatt (2013), referring to compiled data since the 1920's, observes that 3 lines correspond to an hour in objective time.If an individual writes less than 3 lines per hour, for example, this is indicative of deception.

Procedure
We divided the participants into two groups: 30 truth tellers and 30 liars.The participants were randomly assigned to either group and placed in two different rooms.Truth tellers were in turn divided into six groups, each of which consisted of five members.Each group members were instructed to play a word-chain game.While they were playing the game, they were interrupted twice by two confederates.The first confederate entered the room five minutes after the game started and told them to stop playing the game.The truth tellers, however, continued to play the game for another five minutes.The second confederate entered the room and said that the participants in the next room did something wrong and were forced to write a statement for investigation by the police.After this second interruption, the truth tellers completed the game and were asked to write a statement about this staged event.
The liars were placed in the next room and did not participate in this staged event.Instead, they were asked to erase any one word of a text written on the board.This constitutes a serious offense because the text provides a very important clue for investigation by the police.The liars were told that they would be interrogated about their act by a police officer.The experimenter told them to write a statement denying their act lest they should be charged with the offense.Before the liars wrote a statement, the experimenter explained to them all the activities that took place in the next room and told the liars to write a statement about the activities as if they had experienced the staged event as well.To avoid a short and simple statement, the experimenter described the staged event to the letter and warned them that they would be forced to rewrite a statement if their statement were identified to be deceptive.

SCAN Training and Analysis
To analyze the written statements, we trained four graduate students for five hours, two males and two females, who all major in Korean linguistics.Later, they were grouped into two pairs, each consisting of a male and a female.We call these two pairings Pair 1 and 2. We explained to these coders 13 SCAN criteria illustrated above with appropriate examples.Next, they practiced analyzing some sample statements according to the criteria.The sample statements were all written by real criminal suspects.All the statements were anonymized for security and privacy by taking out the real names and substituting them with appropriate synonyms.
The statements were randomly numbered from 1 to 60.We divided them into two sets: 1 to 30 and 31 to 60.Each set contains the same number of truth tellers and liars.Thirty statements in each set were analyzed by two coders.Pair 1 were responsible for statements numbered from 1 to 30, while Pair 2 for statements numbered from 31 to 60.This is for establishing the reliability of subjective judgment between the two coders.Each statement was analyzed according to 13 criteria, each of which is scaled from 1 to 9. If the scale of each criterion is over 5 in average between two coders, it was identified as true; if it was under 5, it was identified as false.Next, 13 criteria were accumulated and calculated in average.If the overall scale of a statement was over 5 in average, then the statement was identified as truthful; otherwise, it was identified as deceptive.

The Effectiveness of SCAN
The first priority of our experiment lies in testing the effectiveness of SCAN as an analytical tool.We will develop our discussion by referring to the two issues addressed at the outset of this paper.First of all, truthful statements are shown to obtain higher scores than deceptive ones when assessed according to the SCAN criteria.Group A, truth tellers, were scored 6.33 and 5.97 by Pair 1 and 2, respectively.1 Group B, liars, were scored 3.70 and 3.70 by Pair 1 and 2, respectively.Overall, the average scores between the two pairs of coders were 6.15 and 3.70 for the two groups, respectively (p < .001).The accuracy rate can be represented as 81.6%.This is very close to Driscoll's (1994) finding when he tested SCAN on English written documents.We also measured the Cronbach's alpha to test internal consistency between the two coders and obtained 0.577, which is not reliable according to the general observation that an alpha score of 0.70 or higher is considered to indicate reliability.
The low Cronbach's alpha score makes us suspicious of the genuine effect of SCAN on our analysis.We wonder if the coders have strictly depended on the SCAN criteria in the same manner and degree with respect to each criterion.We are not sure about this because we have not included coders with no SCAN training.The coders had five hours' SCAN training, which is rather short for understanding and applying the criteria correctly.Moreover, they have had no experience as detectives.This enables us to conjecture either that they might have relied on some other decisive factors or that they might have failed to apply those criteria due to their lack of confidence in them.For the coders' judgment to be considered reliable, there should be a high level of agreement as to which criteria are present in a statement and a high level of consistency in scoring the criteria.The lack of reliability in question surely renders us to shift our attention to computational linguistics, the methods and results of which are fairly reliable to be admitted in the court, as discussed in Chaski (2005Chaski ( , 2008) ) and Chaski, Barksdale and Reddington (2014).Some other factors that influenced the coders could be their general intuition and/or seeming impressions on the hand-written statements.For example, if the subjects' hand writing were clean, neat, and eligible, it could give the coders the impression of truthfulness.The coders might also have applied their general intuition that a long statement is more veracious.It was shown that the two subject groups differ in the amount of statement, which was calculated in terms of lines on a letter-sized paper.According to our statistics, truth tellers made a longer statement than liars, 24.10 versus 17.80 lines (p < .001).It was highly expected at the outset of our experiment that the former would state longer than the latter.The statistical result has met our expectation.This is one of the limits of our experiment for obtaining fabricated statements from an artificially staged event.A close look at the statements reveals that truthful statements are longer and more specific about the event.
Cross-linguistic Application of SCAN Sapir (1987) states that SCAN is based upon his studies of the linguistic behavior of an individual, the concept of which is central in forensic linguistics.If we extend the core concept of SCAN to other languages, we should be concerned with linguistic variations across languages.The second issue that we addressed at the outset of this paper was whether all the SCAN criteria can be applicable cross-linguistically.One of our concerns in this paper, in particular, is the criterion for the use of pronouns.According to Sapir (1987), pronouns signal responsibility, possession, and relationship.Obviously, the first person singular pronoun 'I' is the most frequently used pronoun within a witness or victim statement.According to Sapir (1987), the improper use of pronouns typically includes omitting 'I' or replacing it with its plural counterpart 'we'.By omitting it, the writer wants to deny his full commitment to an incident.By replacing it with 'we', including some other person, the deceptive writer intends to share and lessen his responsibility for a wrongdoing.
It should be noted, however, that this criterion is not universally applicable across languages.Korean, for example, is classified as a pro-drop language.In a pro-drop language, all subject pronouns are often deleted even in an ordinary truthful statement.Once the antecedent of a pronoun is identified in a sentence, for example, the pronoun is usually deleted in subsequent sentences.In addition, the use of pronouns is highly restricted with respect to seniors such as parents and grandparents.When they are the subject of a sentence, their third person singular pronoun is often deleted or the original nominal subject is used repeatedly (Kang, 2000).The first person singular pronoun, in particular, is often omitted even from the beginning of a narrative.
We looked over the first sentence of each statement and found out that all 60 statements start with the first person singular pronoun subject 'I', which was omitted or present.We noted that it was present in 14 statements (23.3%) but missing in the other 46 statements (76.7%).14 statements are in turn divided between 6 truthful statements and 8 deceptive ones.Interestingly, the other 46 statements are evenly divided between 23 truthful and 23 deceptive statements, yielding no differences between the two types of statements with respect to the use of pronouns.It follows that the SCAN criterion, the improper use of pronouns, cannot serve as a guideline for detecting deception at least in Korean and probably not in other pro-drop languages.
It seems that Sapir's SCAN criteria vary across languages.As mentioned above, pronouns rarely play a decisive role in deception detection in Korean.However, connectives are argued to play a vital role in Korean (Kim, 2010: 91).This can be attributed to the agglutinative nature of the Korean language, in which words are formed by joining bound morphemes, called affixes, to indicate grammatical information.There are two types of connectives in Korean: conjunctive adverbs and suffixes.The former correspond to English conjunctions such as and, but, so, therefore, etc.The latter are attached to verbal stems as bound morphemes in nature.The modes of their attachment are so numerous and complicated that the improper use of those connectives can easily and overtly reveal the writer's deceptive intention.

Conclusion
We conducted an experiment to evaluate the effectiveness of SCAN criteria and their applicability in Korean.First of all, the results of our experiment, 81.6% of accuracy rate, seemingly show the potential of SCAN to identify the deceptive content of a written statement.The low Cronbach's alpha level, however, indicates unreliable internal consistency between coders, which in turn makes us suspect the genuine effectiveness of SCAN as an analytical tool.We attribute this to the fact that coders made a subjective judgment on the criteria and were influenced by some other non-linguistic factors such as their general knowledge and intuition with respect to a written statement.Subsequent research will be able to verify the genuine effect of SCAN, in which the same written statements are assessed by a multiple number of coders without any SCAN training.
Second, the use of pronouns, one of the important criteria in SCAN and cohesive devices in writing, exhibits different linguistic aspects from English.Pronouns in Korean, including the first person singular 'I', are often deleted in a spoken and written statement.The deletion of pronouns is not motivated by an individual's deceptive intention but is simply one of the major pragmatic aspects in Korean.We have not found any significant distinction between truthful and deceptive statements in the mode of using pronouns.Instead, Korean has developed a highly complicated system of connectives to maintain cohesiveness.
In conclusion, the results of our experiment, despite the low Cronbach's alpha level, enable us to say that SCAN seems effective in discriminating between truthfulness and deceptiveness in a written statement.If the coders' subjective judgment on the criteria is the main reason for this inconsistency between them, we might need another analytical tool which is objective and scientifically-replicable during the process of assessment.Finally, the distinctive mode of using pronouns in Korean dictates that the current SCAN criteria be customized to crosslinguistic variations to extend its explanatory power.