Adds stringsim_join functions as requested in #71 by JBGruber · Pull Request #74 · dgrtwo/fuzzyjoin

JBGruber · 2020-10-19T16:16:27Z

I love the fuzzyjoin package and today I wanted to learn a little better how exactly it works. By coincidence, I stumbled across #71 and thought it was a pretty good idea to try and implement it, so I would understand the working of the package a bit better (but feel free to reject this as it was mainly a practice that turned out better than I thought).

The PR is still lacking some tests but I wanted to check if you are interested in adding these functions first.

For me, the main reason I want to work with similarity instead of distances is that they are standardized between 0 and 1 (at least most methods). Since I usually work with longer texts of heterogeneous lengths. Newspaper articles, for example, vary significantly in lengths and trying to find duplicates based on distance alone is basically impossible.

emilBeBri · 2020-10-30T09:51:22Z

Very nice, hopefully it will be implemented in the main branch! thank you.

codecov-commenter · 2025-07-19T08:46:33Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Adds stringsim_join functions as requested in dgrtwo#71

364c3bc

JBGruber mentioned this pull request Dec 27, 2020

fuzzy join based on similarity instead of distance #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds stringsim_join functions as requested in #71#74

Adds stringsim_join functions as requested in #71#74
JBGruber wants to merge 1 commit intodgrtwo:masterfrom
JBGruber:master

JBGruber commented Oct 19, 2020 •

edited

Loading

Uh oh!

emilBeBri commented Oct 30, 2020

Uh oh!

codecov-commenter commented Jul 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JBGruber commented Oct 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emilBeBri commented Oct 30, 2020

Uh oh!

codecov-commenter commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Welcome to Codecov 🎉

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JBGruber commented Oct 19, 2020 •

edited

Loading

codecov-commenter commented Jul 19, 2025 •

edited

Loading