Probabilistic matching algorithm methods for socio-demographic data

ndekerorguen · October 21, 2022, 8:04pm

Does anyone have studies comparing Damerau-Levenshtein and Jaro-Winkler? The only one I have so far is Waruru, 2019: ‘Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data’.
But I’d like to see others, if someone has.

Thanks

toanong · November 2, 2022, 9:15pm

Found a good link on this topic: Jaro Winkler vs Levenshtein Distance | Medium

I personally found that Levenshtein distance worked very well on names as well. Also, it depends on the type of data you are comparing. It’s helpful to know what specific socio-demographic variables you use.

ndekerorguen · November 7, 2022, 10:38pm

Thank you very much, Toan