Masters Defense

Published on June 2018 | Categories: Documents | Downloads: 15 | Comments: 0 | Views: 316
of 52
Download PDF   Embed   Report

Comments

Content

Bayesian Network Modeling of Offender Behavior for Criminal Profiling Masters Defense Kelli Crews Baumgartner  Laboratory for Intelligent Systems & Control Mechanical Engineering Duke University 10/18/2007

1

Outline • Back Backgr grou ound nd to Crim Crimin inal al Prof Profililin ing g • Motivation • Bayesian Ne Networks (B (BNs) • Methodology • Results • Future Works • Acknowledgments 10/18/2007

2

The Case of the Mad Bomber  • • • •

Male Eastern European Late 40’s to early 50’s Lived in in Co Connecticut with a maiden aunt or  sister  • Paranoia • Wearing a doublebreasted suit, buttoned

10/18/2007

• • • •

George Meteskey Slavic descent 54 years old Lived in Connecticut with two maiden sisters • Acute Paranoia • When ap apprehended, was wearing a double breasted suitbuttoned 3

Background • Goal oals of of cri crim minal nal pro proffili iling: ng:  – Concen Concentra trate te criminal criminal investig investigati ations ons  – Interr Interroga ogatio tion n strateg strategies ies

• Coll Collab abor orat atio ion n wit with h law law enfo enforc rcem emen entt

10/18/2007

4

Previous Research • Early rly 19 1980’s: 0’s: FBI devel evelo oped the the organized/disorganized dichotomy • 1985 1985 -199 -1994: 4: Dr. Dr. Dav David id Cant Canter er expa expand nded ed the the FBI FBI model: interpersonal coherence, time and place, criminal characteristics, criminal behavior, and forensic awareness • 1999 1999-P -Pre rese sent nt:: G. G. Sal Salfa fati ti and and D. Cant Canter er expl explor ore e the expressive/instrumental dichotomy 10/18/2007

5

Expressive vs. Instrumental • Uses Uses mult multid idim imen ensi sion onal al scal scalin ing g (MDS (MDS))  – Non-me Non-metri tric c multidi multidimen mensio sional nal scalin scaling g procedur procedure e  – Plots Plots associa associatio tion n coeffic coefficien ients ts for each each crime crime scene scene behavior   – simila similarly rly theme themed d action actions s will co-o co-occur ccur in in the same same region of the plot, and vice versa

• Sing Single le offe offend nder er-s -sin ingl gle e vic victim tim homi homici cide des s by by the the British police (82 cases and 247 cases) 10/18/2007

6

Results of MDS Research • 62% of the cases exhibited a majority of the crime scene characteristics in a single theme • 74% of all offenders could also be classified as either expressive or instrumental instrumental • 55% of the cases exhibited the same theme in both their crime scene actions and in the offender background characteristics characteristics • High Frequency crime scene behaviors can be eliminated 10/18/2007

7

Motivation • Deve Develo lop p a net netwo work rk mode modell lin linki king ng the the prof profilile e of of the the offender to his/her decisions and behaviors at the crime c rime scene • Dete Determ rmin ine e corr correl elat atio ions ns betw betwee een n inpu inputt and and outp output ut variables based on data from real cases cas es  – In Inpu putt vari varia abl ble es: Crime Scene Analysis (CSA), victimology assessment, and medical examination. Output variables: Offender profile: sex of offender, prior convictions, relationship with victim  – Tr Trai aini nin ng da data ta:: Solved cases (inputs/outputs known) • Appl Apply y sof softw twar are e to to pro produ duce ce the the offe offend nder er prof profilile e for  for  unsolved cases, given the input variables 10/18/2007

8

Criminal Profiling Software Development •

Use expert knowledge to initialize BN variables



Train the NM with solved cases



Test NM with validation cases by inputting the crime scene variables and comparing the predicted offender variables to the observed offender  variables. Data

10/18/2007

Initial Model

Inputs

Trained Model

Outputs

9

Belief Networks • Bayesian Networks, B=(S,Θ)

P(X1) X1

 – Prob Probab abililis isti tic c Netwo Network rk

P(x1,1 )

P(x1,2 )

0.4

0.6

xi,j =x variable,state P(X2|X1)

X2

P(x2,1 | X1 ) P(x2,2 | X1 )

X3

P(X3 | X1)

P(x3,1 | X1 ) P(x3,2 | X1 )

X1=x1,1

0.8

0.2

X1=x1,1

0.1

0.9

X1=x1,2

0.3

0.7

X1=x1,2

0.5

0.5

10/18/2007

10

Example of Inference P(X1)

• X2 and X3 are observed

X1

• X1 is unknown, to be inferred X2

X3

P(X2|X1)

• Possible states :

10/18/2007

P(X3 | X1)

⎧ X 1 = { x1,1 ,..., x1,r 1 } ⎪ ⎨ X 2 = { x 2,1 ,..., x 2,r 2 } ⎪ X  = { x ,..., x } 3,1 3, r 3 ⎩ 3 11

Bayes Theorem • Bayes Rule* to infer X1 when X2=x2,r2 and X3=x3,r3: P ( X 1 |  x 2, r 2 , x3,r 3 ) =

P( x 2,r 2 , x3,r 3 |  X 1 ) P ( X 1 ) P ( x 2,r 2 , x s ,r 3 )

• Marg Margin inal aliz izat atio ion n of the the obs obser erve ved d vari variab able les: s: P ( x 2,r 2 , x3,r 3 ) =

r 1

∑ P( X 

1

k =1

3

=  x1,k  )∏ P( x  j ,rj | x k  )  j = 2

• The The pos poste teri rior or prob probab abililit ity y dis distr trib ibut utio ion: n: r 1

∑ P( X 

1

=  x1,h |  x 2,r 2 , x3,r 3 ) = 1

h =1

10/18/2007

F. Jensen, Bayesian Networks

12

BN Training • Set Set of of tra train inin ing g dat data, a, T, to lear learn n BN, BN, B=(S B=(S,,Θ) • Use Use str struc uctu tura rall lear learni ning ng algo algori rith thm m to to fin find d Soptimal  – Search Search meth method od neede needed d to decr decreas ease e search search space space – Scoring Metric

• Maxi Maximu mum m Like Likeliliho hood od Esti Estima mati tion on (MLE (MLE)) algorithm to learn CPTs, P(Θoptimal |Soptimal, T )

10/18/2007

13

Joint Probability Scoring Metric • Rela Relati tive ve meas measur ure e of of the the leve levell of of com compa pati tibi bilility ty of  Sh given the training data T, P(Sh |T) • P(Sh |T) α P(Sh,T): P ( S , Τ) = P ( S | Τ) P (Τ) ⇒ h

h

P ( S i , Τ) h

P( S i | Τ) h

P( S | Τ) h  j

=

P (Τ) P ( S , Τ) h  j

P ( S i , Τ) h

=

P ( S  j , Τ) h

P (Τ)

∴ P( S ih | Τ) < P( S  jh | Τ) ⇔ P( S ih , Τ) < P( S  jh , Τ) 10/18/2007

14

Scoring Metric Assumptions 1. All variables are discrete 2. All st structures ar are eq equally lilikely 3. All All var varia iabl bles es are are kno known wn with with no miss missin ing g var varia iabl bles es 4. All All cas cases es in T occ occur ur inde indepe pend nden entl tly y giv given en a BN BN mod model el 5. No prio priorr know knowle ledg dge e of the the num numer eric ical al prop proper erti ties es to to ass assig ign n to Bh with structure Sh before observing T h h h  B = ( S , Θ ) ∈ Β

∫ 

P( S h , T ) =  f  (T  | S h , Θ h )  f  (Θ h | S h ) P ( S h ) d Θ h ⇒ Θ

P( S h , T ) = P ( S h ) ⋅

n

qi

∏ ∏ ( N  i =1  j =1

10/18/2007

(r i − 1)! ij

r i

 N  ∏ + r  − 1)!

ijk 

i

G.F. Cooper et al. , Machine

!

k =1

15

Variable Definition for Scoring Metric P( S h , T ) = P( S h ) ⋅

n

qi

(r i − 1)!

∏ ∏ ( N  i =1  j =1

ij

r i

 N  ∏ + r  − 1)!

ijk 

i

!

k =1

n: Number of model variables qi: Number of unique instantiations of  πi, where

πi=pa(Xi)

r i: Number of possible states for Xi Nijk: Number of cases in T that Xi=xi,k and πi is instantiated as wij, where k=(1,…,r i) Nij:

 N ij =

r i

∑ N 

ijk 

k =1

10/18/2007

G.F. Cooper et al. , Machine

16

K2 Algorithm • Maxim ximize izes scor coring metri tric • Node Ordering: X1 X2X3 • Limi imit on number of paren rents h P( S optimal , T ) = max P S [ ( , T )] ⇒ h S

q r  n ⎧⎪ ⎫⎪ K 2 ( 1 )! r  − h i  N ijk !⎬ ⎯  max ⎨ P ( S ) ⋅ ∏ ∏  ⎯→  ⎯→ ∏  B ⎪⎩ ⎪⎭ i =1  j =1 ( N ij + r i − 1)! k =1 q r  ⎧⎪ ⎫⎪ (r i − 1)!  N ijk !⎬ max ⎨ g = ∏ ∏  B ⎪⎩ ⎪⎭  j =1 ( N ij + r i − 1)! k =1 i

i

h

i

i

h

10/18/2007

G.F. Cooper et al. , Machine

17

K2’ Algorithm P(X4)

P(X1)| X4) X4

P(X2|X1), X4)

10/18/2007

X1

X2

X3

P(X3 | X1)

18

X4

X1 X2

X3

K2’ Algorithm

• Inhi Inhibi bitt nod nodal al conn connec ecti tion ons s bet betwe ween en Inpu Inputt nodes  – d-se d-sepe pera rati tion on:: X 3 ⊥ X2, X4 iff X1 is known  – X2 and X3 are not affected by X4  X1 relationship if parents are known • Everythi thing else lse sam same as K2 ⊥ 10/18/2007

19

The Learned Structure • Initia Initiall Struc Structur ture e is an emp empty ty grap graph h • Final Final Struc Structur ture e is is lear learned ned from from T O1 O2 … Om I1

I2 … Ik  : Only for K2

10/18/2007

20

Parameter Learning • Maxi Maximu mum m Likel Likelih ihoo ood d Esti Estima mato torr (MLE (MLE)) for for para parame mete ter  r  learning due to no missing values (EM algorithm otherwise) • MLE MLE dete determ rmin ines es the the par param amet eter ers s that that max maxim imiz ize e the the probability (likelihood) of T  f  (T  | Θ , S ) = h

h





 f  (C i | Θ , S ) =  L(Θ | T , S ) h

h

h

h

i =1 t 

Λ = ln L = ∑ ln[ L(Θ h | T , S h )] ⇒ i =1

∂Λ =0 h ∂Θ 10/18/2007

21

Modeling Example •

Trai Train n a Netw Networ ork k Mode Modell for for a sim simple ple prob probllem with with two two inputs and two outputs: – Inputs, CSA (1) Place of aggressio aggression n characteri characteristics stics (2) Amount Amount of disorder disorder provoked provoked from fight/str fight/struggle uggle

– Outputs, cr crimi iminal pr profile: le: (1) Gender of the offender  (2) Presence of sexual relations between victim and offender 



Train mo model wi with si simulated ca cases usi usin ng Ma Matlab



Use Use evi evide den nce of the the inp input uts s to to in infer fer out outp puts uts in in ne new cas case es

10/18/2007

22

Example: Variable Definition Inputs from Crime Scene Analysis (CSA) Checklist ( evidence ): ): • Characteristics about Place of A of Aggression, node PA (1) (1) (2) (2) (3) (4) (4)



Not Not cro crowd wded ed exte extern rnal al plac place e (re (remo mote te)) Semi Semi-c -cro rowd wded ed exte extern rnal al plac place( e( semi semi-r -rem emot ote) e) Crowded ex external pl place Inne Innerr pla place ce (roo (room, m, buil buildi ding ng,, off offic ice) e)

DiSorder Provoked by fight/struggle, node DS (1) (1) (2) (2) (3) (3) (4)

In room room/a /are rea a wher where e cor corps pse e is is fou found nd On all all the the area area/ro /room om/st /stud udy/ y/of offi fice ce/st /stor ore e In the the vic vicin init itie ies s of are area/ a/ro room om/s /stu tudy dy/o /offi ffice ce/s /sto tore re No disorder provoked

Outputs from Network Model, criminal profile: • Gender of offender, node G (1) (2)



Male Female

Presence of  Sexual Relations between victim and offender, node SR

10/18/2007

(1) (2)

Yes No

23

Network Model 1 Solved Case:

PA = 2 (Semi-remote) DS = 4 (No disorder) G = 1 (Male) SR = 2 (No)

PA (Place of Aggression), DS (DiSorder provoked by fight), G (Gender of offender), SR (Sexual Relations between victim and offender). Network variables:

10/18/2007

24

Results: Percent Binary Error  for Validation Set • Error Error met metric ric:: if x = x* x* , then then error error = 0, if if x ≠ x*, then error = 1 Percentage of Each Output Node Inferred Incorrectly vs. Number of Training Cases

10/18/2007

25

Full CP Model • 247 cases:  – 200 200 trai traini ning ng cas cases es (T) (T)  – 47 val valid idat atio ion n case cases s (V) (V)

• 57 total variables  – 36 CS variab variables les (inpu (inputs) ts)  – 21 CP variab variables les (outpu (outputs) ts)

10/18/2007

26

Internal Stability • Inte Intern rnal al stabi stabilility ty refe refers rs to the the con consi siste stenc ncy y of  of  predictions

10/18/2007

27

Overall Performance Summary • Overa verallll Predi redic ctiv tive Acc Accur urac acy y (O (OPA):

OPA(%) =

K C ,CL ( ≥50%) K t 

× 100

Kt : Total predictions KC,CL(≥50%): Total correct predictions with CL ≥50% Algorithm

K2

K2’

Accuracy (%)

64.1%

79%

Correct Predictions (number of nodes)

633

780

10/18/2007

28

Confidence Level of Predictions • Confid nfide ence Leve Levell Acc Accu uracy racy (CLA (CLA): ):

CLA(%) =

Algorithm: CL:

K C ,CL K CL

× 100

K2 || K2’ KCL

KC,CL

CLA(%)

Δ

(KC,CL)

0.5 ≤CL< 0.7

225 || 262

140 || 162 62.2 || 61.8

22

0.7≤CL< 0.9

405 || 470

334 || 386 82.5 || 82.1

52

0.9 ≤ CL

168 || 255

159 || 232

73

10/18/2007

94.6 || 91

29

Zero Marginal Probability Variables Algorithm CL KCL

K2 || K2’ ≥

ZMP Variables

50%

798 || 987

PA(%) =

K t  − ( K w + K  ZMP ) K t 

× 100

1. Add more training cases 2. Decla eclare re vari variab able le indep ndepe ende ndenci ncies (K2’) 3. Decre ecrea ase num number ber of of syst system em vari varia ables bles 10/18/2007

30

High Frequency Variables 3. Decrease number of system variables:  – –

High Frequency variables are present in more than 50% of cases Hig High Frequency Mod Model (HFM): CP model with HF Variables removed

CS Behavior

Frequency

Face not hidden

88.4%

Victim found at scene where killed

78.9%

Victim found face up

61.1%

Multiple wounds to the body

52.2%

10/18/2007

31

HFM Overall Predictive Accuracy • Negl Neglig igib ible le accu accura racy cy incr increa ease se for for K2’ K2’ • Decr Decrea ease se in the the num numbe berr of of ZMP ZMP vari variab able les s for K2 Algorithm:

K2HFM K2’HFM

K2

K2’

OPA (%)

66%

79.6%

64.1%

79%

KC,≥50%

652

786

633

780

KZMP

168

0

189

0

10/18/2007

32

Frequency of Occurrence • Freq Freque uenc ncy y of occu occurr rren ence ce is is the the num numbe berr of  times the variable was present in the dataset • Freq Freque uenc ncy y met metho hod d (F) (F) pred predic icts ts the the sta state tes s of V by the more apparent state in T • The The CP CP by by F is th the sam same e ove overr al all V

10/18/2007

33

OPA for K2’ and F • Overa erall Pred redicti ictiv ve Accu ccuracy racy::

Algorithm

K2’

F

Accuracy (%)

79%

79.3%

Correct Predictions 780 (number of nodes)

10/18/2007

784

34

Confidence Level of Predictions • Confid nfide ence Leve Levell Acc Accu uracy racy (CLA (CLA): ):

Algorithm: CL:

K2’ || F KCL

KC,CL

CLA(%)

Δ

(KC,CL)

0.5 ≤ CL< 0.7

262 || 329 162 || 216

65.7 || 61.8

-54

0.7 ≤ CL< 0.9

470 || 470 386 || 396

82.1 || 84.3

-10

0.9 ≤ CL< 0.95

139 || 141

125 || 126

90 || 89.3

-1

0.95 ≤ CL

116 || 47

107 || 46

92.2 || 98

61

10/18/2007

35

Information Entropy • Info Inform rmat atio ion n Ent Entro ropy py (H) (H) qua quant ntif ifie ies s the the certainty/uncertainty of a model • Amou Amount nt of info inform rmat atio ion n is is rel relat ated ed to the the confidence of the prediction: less entropy means more confidence in the long run

 H  = −

r i

∑  p log( p ) i

i

i =1

10/18/2007

36

H for K2’ and F • K2’ K2’ mod model el H is is an an inf infea easi sibl ble e cal calcu cula lati tion on • An aver averag age e of of eac each h var varia iabl ble’ e’s s H is is a sui suita tabl ble e measure  H ( X 1,..., X k  ) =



∑ H ( X 

i

|  X i −1 ,..., X 1 ) ⇒

i =1

 H ( X 1,..., X k  ) ≤



∑ H ( X  ) i

i =1

• H(K2’)=0.43 vs. H(F)=0.48 H(F)=0.48 10/18/2007

37

CL Ranges for Predictions

10/18/2007

38

Crime Scene Variables • Inpu Inputt vari variab able les s from from Crim Crime e Scen Scene e (evi (evide denc nce) e) Input Variable

Definition

I1, pen 

Foreign Object Penetration

I2, hid 

Face Hidden

I3, blnd 

Blindfolded

I4, blnt 

Wounds caused by blunt Instrument

I5, suff 

Suffocation (other than strangulation)

10/18/2007

39

Criminal Profile Variables • Outp Output ut varia variabl bles es comp compri risin sing g the the crimi crimina nall pro profil file e Output Variable

Definition

O1, yoff 

Young offender (17-21 years old)

O2, thft 

Criminal record of theft

O3, frd 

Criminal record of fraud

O4, brg 

Criminal record of burglary

O5, rlt 

Relationship with victim

O6, unem 

Unemployed at time of offense

O7, male 

Male

O8, famr 

Familiar with area of offense occurrence

10/18/2007

40

Predicted Case vs. Actual Case • Frequency Profile • K2’ Profile -AYoung: A (0.813) Young: A (0.805) -ATheft: A (0.54) Theft: A (0.75) -PFraud: A (0.67) Fraud: A (0.76) -ABurglary: A (0.67) Burglary: A (1) Relationship: A (0.64) -A- Relationship: A (1) Unemployed: P (0.52) -P- Unemployed: P (0.79) -A- Male: A (1) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.91) -P10/18/2007

41

Predicted Case vs. Actual Case • Frequency Profile • K2’ Profile -AYoung: A (0.83) Young: A (0.81) -PTheft: A (0.54) Theft: P (0.52) -AFraud: A (0.67) Fraud: A (0.67) -PBurglary: A (0.67) Burglary: A (0.54) Relationship: A (0.64) -A- Relationship: A (0.57) Unemployed: P (0.52) -P- Unemployed: P (0.53) -P- Male: P (0.82) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.86) -P10/18/2007

42

Predicted Case vs. Actual Case • Frequency Profile • K2’ Profile -AYoung: A (0.83) Young: A (0.81) -ATheft: A (0.54) Theft: P (0.97) -PFraud: A (0.67) Fraud: A (0.67) -ABurglary: A (0.67) Burglary: P (0.60) Relationship: A (0.64) -P- Relationship: A (0.59) Unemployed: P (0.52) -A- Unemployed: P (0.61) -A- Male: P (1) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.87) -P10/18/2007

43

Predicted Case vs. Actual Case • Frequency Profile • K2’ Profile -AYoung: A (0.93) Young: A (0.81) -ATheft: A (0.54) Theft: A (0.75) -PFraud: A (0.67) Fraud: P (0.73) -ABurglary: A (0.67) Burglary: A (0.82) Relationship: A (0.64) -P- Relationship: P (0.56) Unemployed: P (0.52) -A- Unemployed: A (0.87) -P- Male: P (0.89) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.78) -P10/18/2007

44

Evidence: pen:: penetration pen blnd:: blindfolded blnd hid:: face hidden hid blnt:: blunt blnt instrument suff : suffocation CP: yoff : young offender  frd:: fraud frd rlt:: relationship w/ rlt victim thft:: theft thft brg:: burglary brg famr : familiar w/ area unmp:: unemployed unmp 10/18/2007

Slice of K2’ DAG …

yoff 

thft





pen

frd

brg

unmp suff 

rlt

blnd

male

famr  

hid

blnt 45

Conclusions • Due to the absence of ZMP variables , the K2’ structural learning algorithm requires fewer cases compared to K2 • A ben benef efit it of a BN BN mod model el over over the the naï naïve ve freq freque uenc ncy y approach to acquire a CP is the range of  confidence levels of the BN model due to the evidence • Because all of the variables are binary , the frequency approach is more susceptible to better  performance than if the variables had many states 10/18/2007

46

Further Research • Deve Develo lop p a sea searc rch h alg algor orit ithm hm that that incr increa ease ses s performance for BN • Inco Incorp rpo orate rate Sa Salfa lfati’s ti’s Exp Express ressiv ive e/ Inst Instru rum mental tal dichotomy to supervise training of a BN model • Apply me method to to ot other fi fields • Comb Combin ine e NN NN and and BN meth method ods s to impr improv ove e mod model el performance 10/18/2007

47

Neural Network Research • Neur Neural al netw networ orks ks impl implem emen ente ted d sim simililar ar to BN Algorithm:

K2’

NN

Nodes Predicted Correctly

780

739

Overall PA (%)

79%

75%

10/18/2007

48

Acknowledgments • Special thanks to  – Dr. Silvia Silvia Ferrar Ferrari, i, advi advisor  sor   – Dr. Gabrie Gabrielle lle Salfati Salfati,, John John Jay Jay College College of  Criminal Justice  – Dr. Marc Marco o Strano Strano,, Presid President ent of of the International Crime Analysis Association (ICAA)  – My Mas Maste ters rs Com Commi mitte ttee e

10/18/2007

49

April Fools' Day Origin • Apri Aprill 1 was was ori origi gina nalllly y New New Year Years s Day Day in France • 1582 1582 Pope ope Gre Grego gory ry dec decreed reed the Gregorian calendar was to replace the Julian calendar  • Janu Januar ary y 1 is New New Yea Years rs day day acc accor ordi ding ng to new calendar  • Thos Those e who who refu refuse sed d to to acc accep eptt the the new new calendar were April fools 10/18/2007

50

Questions?

10/18/2007

51

Laboratory for Intelligent Systems & Control Mechanical Engineering Duke University

10/18/2007

52

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close