In this paper, we explore the impact of combining two key components of behavioral analysis into a training course. Firstly, we aim to dispel the myth that humans are unable to capture data across multiple communication channels (Face, Body/Gestures, Voice, Linguistic Content, Interactive (verbal) Style and Psychophysiology) as long as they are orientated down towards 27 relevant cues (we label as PInS: Points of Interest) that aid veracity judgments. It is about noticing what we see and hear. We highlight the trade-off of introducing a truth bias to enable a manageable methodology for real-time lie/truth detection using a holistic approach with concurrent attention to multiple channels of a subject’s behavior.
Secondly, we highlight the key to lie detection is in the questions you ask. Unpredictable questions that are no trouble for a truthful person, yet are a major challenge for the liar creating leakage across six channels that are probably impossible to manage simultaneously. We highlight the challenges of creating effective, efficient training processes that are fit for purpose for the field and the need to devise innovative approaches for those working in high-stakes, real- time contexts, maybe working alone, maybe in interactions that last only three minutes. Ultimately, we are driven to share and contribute towards the real-world application of deception and credibility research in forensic and security settings to make this world a safer place for all, including those employed to protect us.
This paper outlines the impact of 72 hours ‘ETaC/SCAnR Training’ upon a group of 23 individuals regarding their skill in distinguishing truthful accounts from deceptive accounts. The training combined two main elements; a) developing abilities and skills in seeing/hearing data using the Evaluating Truthfulness and Credibility (ETaC) course plus b) the extended training on the Six Channel Analysis Real-time (SCAnR) model, and applying a Real-time Cognitive Conversation strategy (RC2).
- Hypothesis – the accuracy of veracity judgments on real-time interviews using the taught methodology would be significantly better than chance following 56 hours of ETaC/SCAnR training.
PRE-POST TESTING APPROACH
The average ability of professionals, including security, law enforcement, lawyers, judges, secret service agents and business people in making judgments about lies/truth from laypersons averages 54% according to a meta analysis of 206 studies involving a range of professional and other groups (Bond & DePaulo, 2006). One of the factors responsible for such a poor performance is the view that professionals responsible for detecting deception have been trained with incorrect and inaccurate procedures that continue to circulate within the unique cultures of these professions (Stromwall, Granhag, and Hartwig, 2004). This study included a pre-test to produce a baseline of performance from the 23 SRI/OTP cross-agency security professionals on multichannel deception cues and elicitation techniques. There are a number of structural features that need to be recreated for testing on a course to have any value in real-world high-stake situations (Frank, 1992: Ekman, 2001: Podlesny & Raskin, 1977). The testing involved course delegates in the analysis of a combination of two types of tests. The first type involved the 23 delegates making a two-minute introduction about their personal and professional lives. The introductions were managed so that two delegates, one male and one female, chosen at random 30 minutes in advance of the start of the course could be held back discretely and asked to hold back and introduce themselves after the other 21 had done so. After the 21 introductions the whole group were briefed that delegates 22 and 23 had each randomly drawn an instruction from a bag of 12 pebbles (six ‘lie’ pebbles and six ‘truth’ pebbles) to either tell the truth about themselves or to fabricate a complete lie about their lives previous to their current job. The rules changed for these two subjects in that a volunteer from the group would invite them to introduce themselves for two minutes and then question them for up to 15 minutes about their history with the goal of helping the 21 delegates to correctly judge which ‘pebble’ they had in their pocket. The two subjects were briefed to use all their skills to convince all that they were telling he truth about their past, regardless of the pebble in their pocket and informed of a monetary reward if they succeeded. The experiment administrators recorded the sessions for later analysis, collated the results of the experiment and debriefed the subjects on their judgment of the quality and difficulties they experienced during the interview. The second experiment used a ‘mock theft’ scenario where two volunteers from the wider organization – unknown to the delegates – were briefed to engage in a mock theft in an adjacent room. The experiment was cleared with line management before the experiment. The subjects were isolated from each other and briefed about the ‘mock-theft’ experiment where they would be invited to enter a separate room where they would find a jacket on a table. Inside the jacket was a wallet, within which we had placed 100 USD. The subject was instructed to enter the room, inspect the jacket and the wallet, examine how much money was in the wallet and then make a free choice to steal or not steal the money from the wallet. They were then to put the wallet back in the jacket, put the jacket back on the table and leave the room. A research team member was present during the act so ground truth could be established. The delegate group and group facilitators were blind to the act in each of the experiments. They had a free choice to steal or not steal the money in the wallet. If they chose to steal the money and they managed to convince the majority of the trained delegates that they didn’t steal it they could keep the money. If they failed to convince the majority then they would lose all the money and also lose the 25 USD participation fee. If, however they chose to leave the money where it was and convince the majority of the trained delegates that they didn’t steal the money then they would receive 25 USD plus the 25 USD participation money. If they failed to convince the majority that they were being truthful then they would lose all the money. The subjects were informed they had a free choice, they could withdraw at any time and that this was only an experiment with no other consequences for them for the decision they were to take.
After each subject had exited the room where the wallet was located they were directed into the main training room where they were each met by a different (volunteer) interviewer (delegate from the course) with the rest of the delegates observing from chairs in a wide horse-shoe around the interview setting. The interviewers were given five minutes to prepare plus a maximum of 15 minutes to interview each subject. At that time-out the subject would be asked to leave the room for a short while and then re- invited back in when the interviewers were ready. At the end of the ten minutes the subject would be asked to leave the room whilst all delegates (including the interviewer) were asked to independently indicate whether they judged the subject to have stolen the money (lying) or not stolen the money (truth teller).
In addition, delegates were tested on their knowledge and understanding of lie/truth detection with a multiple response test at the start and end of the course assessing understanding of core concepts and principles.
After the pre-test the individual results were collected and shared but not analyzed so similar experiments could be used for the post-test.
The above test was repeated exactly as the pre-test, immediately on completion of the course along with a further half-day simulation assessment within the Otepeni (OTP) airport.
This involved 12 of the delegates performing undercover as passengers being assigned a ‘subject’ identified by a head/shoulder photograph and instructed to find them in either the public, security or gate zones we had established. The 12 subjects (six male, six female) were volunteers from 3rd year students from the Romanian Military Intelligence Academy – individuals who had undergone various degrees of military and intelligence training themselves. They supplied the authors with authenticated resumes so we had access to ground truth about their ID and past. They were briefed 24 hours in advance of the experiment by two of the authors and they randomly chose ‘pebbles’ from a bag resulting in eight drawing a ‘lie’ pebble and four drawing a ‘truth’ pebble. The eight truth tellers were briefed to simply enter the airport and tell the truth about themselves and their past if a fellow-passenger approached them – they lost their participation money if they were not believed to be truthful. The other four were briefed to fabricate everything about their past except for their name, current job and study program which they were told to tell the truth about –a financial reward of $100 was offered for successfully deceiving their way through the interviews. This was lost along with participation money if they failed. They were prompted to spend time overnight developing and rehearsing their cover story. Delegates were instructed neither to access the internet/systems nor to probe about the flight and other future plans (to save the expense of arranging real flights, documents and destination activities). Subjects were given permission to block any attempts with the simple answer, “I cannot discuss that as I am engaged in a psychological experiment right now”. Delegates were made aware of this. Subjects were told not to supply any documents (passport/boarding passes, etc.) to the delegate who approached them if they asked.
The 12 undercover delegates were, fitted with covert (wearable) recording equipment (watches, spectacles, body cameras, etc.) given a maximum of 15 minutes to engage their subject in a casual conversation and decide at the end of that whether to refer the subject to a second stage interview with two fellow delegates in an interview room, or to close the conversation and report to the nearby experiment team member that they were happy that the subject was being truthful about their ID and history.
The second stage, more formal interviews were capped at 15 minutes, after which the interviewers had to decide on whether their referred subject was being truthful or lying about their ID/past. These were observed/recorded by support teams and organized so all delegates performed either an informal or a formal interview. Results were documented.
The delegates consisted of 23 professionals (14 male; nine female – drawn from across all the major civil airport security and military/government agencies/groups (included uniformed and non-uniformed staff from Border Police, Immigration, Passenger Screening, Transport Police, Behavior Detection Officers (BDOs), Air Marshalls, Aviation Security officials, Gendarmerie, Intelligence Service, etc.) aged between 25 and 56 years old).
ETaC/SCAnR TRAINING DESIGN
The training and research was designed, tailored and coordinated by Emotional Intelligence Academy 1 combined two main elements; a) developing abilities and skills in seeing/hearing lie/truth related data using the Evaluating Truthfulness and Credibility (ETaC) course (accredited and designed by Paul Ekman PhD and delivered through an international network of Paul Ekman International Licensed Delivery Centers 2 ) plus b) the extended training on the Six Channel Analysis Real-time (SCAnR) model, and applying a Real-time Cognitive Conversation strategy (RC2). This forms part of the BAII Diploma3, which carried 60 credits towards an MSc in Communications, Behavior and Credibility Analysis at Manchester Metropolitan University. The design also recognized that many of the high stake, often casual, conversations security staff were engaged in from across the security layers were often no longer than three minutes in duration (airport screening checkpoints, check-in, customs, border police, BDO/law enforcement interviews on the airport floor, etc.). And they often have to work alone. So this was a contextual requirement we were aiming to serve and test, as this would be readily scalable for the luxury of the longer, more formal 2nd stage interview contexts in Airports.
Although stereotypes about the non-verbal behavior of liars are numerous, researchers to date have failed to identify many reliable cues (DePaulo, Lindsay, Malone, Muhlenbruck, Charlton, & Cooper, 2003;Vrij, 2004). Most deception studies focus on cues that are considered to be highly indicative of deception, without considering all communication channels simultaneously (Archer&Lansley, 2015). Passive behavior observation of face and body alone may not be sufficient but has delivered 900% higher performance than random sampling in the USA airports4 in detecting malfeasants in high stake situations. Multi- channel analysis is necessary as ‘different people show different cues to deception’ in different contexts (Vrij, 2008: 8). Porter & ten Brinke (2010) argue that,
In looking to the future Porter & ten Brinke (ibid) propose:
(1) there is a need for education in the major pitfalls to avoid in catching lies and acknowledging the importance of critical thinking to circumvent tunnel vision associated with overzealousness.
(2) the lie catcher should direct his/her attention to prominent and ideally consistent changes (across multiple channels) from baseline behavior with a focus on the most reliable specific cues to deception.
(3) the lie detector should consider the use of strategic interviewing techniques such as manipulating a suspect’s cognitive load by asking unanticipated questions.
Ormerod & Dando (2014) demonstrated the value of randomized, controlled questions in their study though the emphasis was on verbal channels and the study set the performance against limited non-verbal behavior. It seemed obvious that the way forward is to employ what we termed real-time cognitive conversations (RC2) in combination with analysis of all six channels5 though the risk was in over-facing the trainee behavioral analyst with too much data. The overload was likened to drinking water from a fire hose – so we needed to consider orienting the behavioral analyst towards significant stimuli – what Porter & ten Brinks referred to in their second proposal above as “the most reliable specific cues to deception”.
The key features in the Archer & Lansley (2015) study were threefold with the key elements extracted here:
1. “We took the controversial and uncomfortable step of taking truth for granted. We assume the truth from the start. This was a bias that will be spurned in laboratory settings though paid off significantly in performance outcomes.
2. We then carried out a meta analysis of research around deception cues in medium to high stake contexts and prioritized 27 indicators (we merely termed them “Points of Interest” – PInS) across all six channels. A PIn was only qualified as such when a cue was inconsistent with the Account6, Baseline or Context. Each of the 27 PInS sit on empirical research (Appendix 1).
3. We were particularly interested in the occurrence of three or more PInS across two or more channels within seven seconds of the RC2 stimulus/question.”
The 56 hour ETaC/SCAnR events was structured over 2 four day events with 50% input on models and concepts, with 50% application, coaching, assessment, feedback and reinforcement building the channel analysis skills using text, audio, video and real people. The group used each other under close observation to develop attentiveness skills, manage biases and remove contaminating questioning interventions.
RESULTS AND DISCUSSION
Every delegate reported that they made significant progress over the 56 hours and the results supported the overall achievement. The knowledge and skills tests were used to embed key principles to aid performance transfer beyond the activities used on the course. The process and results can be summarized as follows:
- 23 SRI/OTP cross-agency team
- Pretest = 55.1% accuracy – (2 ID and 2 Mock Theft)
- 4 days ETaC + 2 days Behavior Analysis + 2 days Elicitation Techniques)
- Post test = 84.67% accuracy – (2 ID and 2 Mock Theft + 12 2-stage simulation in Airport)
Hypothesis – the accuracy of veracity judgments on real- time interviews using the taught methodology was 84.67% – significantly better than chance following 56 hours of ETaC/SCAnR training. The likelihood of the results being due to chance is 2%. Therefore supporting our hypothesis that, “the accuracy of veracity judgments on real-time interviews using the taught methodology would be significantly better than chance following 56 hours of ETaC/SCAnR training.”
The delegates reported that the key for them was the multi- channel data collection combined with the usefulness of engaging, casual conversations that created no problems for the truthful subject, though created trouble for the liar who leaked the truth from across the channels when they were lying.
THE WAY FORWARD
This experiment has been replicated in different contexts though it also needs repeating with different delegates using the same methodology, design and trainers so we can limit any extraneous variables. The research goals are to apply this on a wider scale in Bucharest Airports over 2016 and segment the research into four areas:
- Primary detection via passive crowd analysis (CCTV and BDOs)
- Casual conversations by undercover and uniformed staff focused on and including the pre- screening layer in departures
- Casual conversations by undercover and uniformed staff focused on and including the gate area and onboard the plane
- Formal interviews where referrals are made from the above layers.
We are also hoping to engage the support of one or more airlines to extend opportunities to the public and check-in zones between the entrance and screening and are in the process of secured engagement with the stakeholders in these zones to test ETaC/SCAnR training in these environments. Pre-measures and control group parallels will be secured for evaluation purposes.
The pretests were taken in the morning on day one; the post test in the afternoon on the last day. This variable needs to be removed so attention and fatigue are not an influence on results. We need to also consider running the airport simulation experiment so we have a pre-test in that format.
We may need to look creatively at raising the positive and negative stakes to better replicate the high-stake conditions of safety and security breaches we are seeking to thwart whilst being mindful of the ethical factors in psychological research in live environments.
Following extended studies with a range of experiments the frequency of occurrence of each of the 27 PInS may highlight that this might be able to be prioritized further so we can establish an optimized hierarchy and identify the point at which a reduced model becomes unreliable. We also need to be mindful of emerging research where the 27 PIns and elicitation tactics may need to be extended.
Archer, D.E. & Lansley, C.A. (2015). Public appeals, news interviews and crocodile tears: an argument for multi-channel analysis. [Online] Available at: http://www.euppublishing.com [Accessed 15 August 2015].
Bond, C. F. J., & DePaulo, B. M. (2006). Accuracy of deception judgments. Personality and Social Psychology Review, 10(3), 214-234.
DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psychological Bulletin, 129(1), 74-118.
Ekman, P. (2001). Telling lies: Clues to deceit in the marketplace, politics, and marriage. New York: W W Norton & Co.
Frank. M.G. (1992). Commentary: On the structure of lies and deception situations. In S. Ceci, M. DeSimone Leichtman, & M.B. Putnik (Eds.), Cognitive and social factors in early deception Hillsdale, NJ: Erlbaum, 127-146.
National Research Council (2002). National Research Council, Committee to Review the Scientific Evidence of the Polygraph, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
Ormerod, T. C., & Dando, C. J. (2014, November 3). Finding a Needle in a Haystack: Toward a Psychologically Informed Method for Aviation Security Screening. Journal of Experimental Psychology: General. Advance online publication. http://dx.doi.org/10.1037/xge0000030
Pearse, J., Lansley, C.A. Reading Others – A psychological model for making sense of truth and lies. Training Journal, 10 October 2010.
Podlesny, J.A., & Raskin, D. (1977). Physiological measures and the detection of deception. Psychological Bulletin, 84, 782-799.
Porter, S., & ten Brinke, L. (2010). The truth about lies: What works in detecting high- stakes deception? Invited article in a Special Issue of Legal
and Criminological Psychology, 14, 119-134. doi:10.1348/135532509X433151
Strömwall, L. A., Granhag, P. A., & Hartwig, M. (2004). Practitioners’ beliefs about deception. In P.Granhag, & L. Strömwall (Eds.), The detection of deception in forensic contexts. (pp. 229-250). New York: Cambridge University Press.
Ten Brinke, L. and Porter, S. 2012. ‘Cry Me a River: Identifying the Behavioral Consequences of Extremely High-Stakes Interpersonal Deception’. Law and Human Behavior 36(6), pp469-77.
Vrij, A. and Mann, S. 2001. ‘Who killed my relative? Police officers’ ability to detect real-life high-stakes lies’. Psychology, Crime and Law 7, pp119-132.
Vrij, A. (2004). Why professionals fail to catch liars and how they can improve. Legal and Criminological Psychology, 9(2), 159-181.
Wright Whelan, C. 2012. ‘High stakes lies: Identifying and using cues to deception and honesty in appeals for missing and murdered relatives’. Unpublished. University of Liverpool, UK.
APPENDIX 1 – 27 PinS overview and index to research
The psychophysiological/ANS channel (P) accounts for seven of the twenty-seven PIns, five of which capture physiological signals that users can sometimes see and hear without technical aids: that is, changes in skin colour, perspiration (P3), blood pressure on visible veins (P4), breath (P5), dryness of the mouth (P6) and pupil size (P7). P1 and P2 relate to changes which usually require technology to be detected (e.g., heart rate and galvonomic monitoring).
Similar to ten Brinke and Porter (2012), the EIA Group model draw on insights from FACS-related research for the first of the five Face codes (F1). Specifically, our (FACS-trained) SCAnR coders catalogue FACS anomalies with seventeen key FACS codes. F2 marks a durational misfit (Ekman and Frank, 1993), F3 marks evidence of asymmetry (unless indicative of contempt), and F4, evidence of asynchrony between muscle movement across the face for ‘felt’ emotion(s unless subtle). Finally, F5 marks onset/offset profiles which do not display the smooth onset/offset patterns characteristic of felt emotions (Ekman and Frank, 1993). As with all features, we required inter-annotator agreement between three of the four coders for an individual inconsistency to be verified as a PIn in the SCAnR analysis proper.
The Body channel (B) captures features shown to be of value in emotion and veracity judgments (Vrij et al., 1996): specifically, micro- gestures or gestural slips indicative of ‘leakage’ (B1); evidence of change(s) in illustrator behaviour (B2) and/or manipulators (B3); evidence of (muscle) tension in the body (B4); and changes in eye behaviour (blinks, eye gaze/movement/closure, etc.). Codes B1 to B5 are based on a more detailed system of Action Descriptors (ADs) developed by the team at EIA‡‡ (cf. ten Brinke and Porter, who simplified FACS Action Units/Descriptors or AUs to aid realtime annotation and multi-rater coding comparisons).
The SCAnR coding adopted for the Voice (V) enables realtime analysis of changes to pitch (V1), volume (V2) and tone (V3), but coders can also note sound lengthening, backchannels, stressed syllables, utterance trail offs, etc. (Rockwell et al., 1997).
Interactional Style (IS) is our label for phenomena such as fillers, parroting, evasion strategies (including equivocation markers), response latency, emphatic statements, repetition, qualifiers, pronoun usage (e.g., use of third person/avoidance of first person, or vice versa), (de)personalisation, distancing devices, etc. (following Jurafsky et al., 2009). More specifically, I1 marks changes to the rhythm (or ‘flow’) of the interaction because of features such as (filled) pauses, stutters, disfluencies, response latency, and so on. I2 marks evidence of evasiveness/ambiguity/equivocation (Wright Whelan et al., 2013). I3 encompasses influencing or impression management strategies. For example, the use of religious belief/values/character references, credibility labels, or a proof/evidence frame (Houston et al., 2012), representational frames relating to the subject, inappropriate politeness, repetition, etc.
The Content channel (C) contains four PIns. C1 captures changes in tense or inappropriate tense usage (such as when someone pleads for the return of a loved one, but refers to them in the past tense). C2 captures distancing language, following DePaulo et al.’s (2003) observation that deceivers will sometimes use linguistic constructions (e.g., fewer self-references, more tentative words) which serve to distance them from the subject(s) of their speech (see also Hancock and Woodworth, 2013). Here, SCAnR users might consider pronouns, tentativeness features, subject/noun changes, emotional terms/affective language, inappropriate concern, qualifiers, minimisers and other epistemic modality markers, etc. (Bond and Lee, 2005; ten Brinke and Porter, 2012; and Newman et al., 2003).
The third Content criteria, C3, makes use of an adapted version of Criteria-Based Content Analysis (CBCA)§§. Although CBCA is primarily used to assist (European) courts in evaluating the credibility of children’s (transcribed) narratives of sexual abuse, it has been used to evaluate adult accounts relating to issues other than sexual abuse (Porter and Yuille 1996; and Vrij et al., 2000). When drawing on CBCA criteria, Vrij et al. (2000) used a restricted set in combination with Reality Monitoring criteria. The SCAnR method, in contrast, has been to amend CBCA criteria*** so that users might record, as a PIn, occasions when the content of the story that S conveys: (i) lacks coherence, (ii) lacks unstructured, spontaneous reproduction, (iii) includes inappropriate detail, especially relative to the core of the story and what we know about memory (the account may also be void of related associations and unusual/superfluous details), (iv) exhibits contextual vagueness (as opposed to being characterised by contextual embedding), (v) is devoid of descriptions of interactions (including [recalled] verbatim conversations), (vi) is devoid of admissions of poor memory recall/spontaneous correction of memory errors (without prompting) and self-deprecation, and (vii) is devoid of accounts of mental states (self and other). The final Content criteria, C4, is used when the SCAnR user recognises a verbal slip as a PIn (Ekman, 2004: 40).
‡‡ The EIA www.emotional-intelligence-academy.com team have catalogued/illustrated each action/signal with a still photograph or video, and assigned it a specific AD reference number. The photographs and videos are not to be taken too literally, as they are designed to exemplify, not typify. There will be slight variations from person to person due to their anatomy and flexibility.
§§ CBCA assumes that ‘a statement derived from memory of an actual experience differs in content and quality from a statement based on invention or fantasy’ (Vrij, 2008: 209).
*** This aspect of SCAnR may prove to be the most contentious, given that CBCA was developed to assess witness credibility, not whether a person was telling the truth or being deceptive, and has previously struggled to distinguish short lies (non-experienced elements) within otherwise truthful stories (i.e., experienced events; see Vrij and Mann, 2001). In our defense, we point to the fact that we are not the only researchers to use CBCA in deception detection research (see, for example, Colwell, 2007; and Vrij et al., 2004, 2007), and the fact that our adapted version of CBCA is but one component of the SCAnR method.