Dishwasher

Ebbot Dishwasher is a function designed to censor personal information that may appear in conversations. It combines multiple methods to detect sensitive entities in text.


Regex

Regex (Regular Expressions) is the most straightforward method. Entities are defined using specific patterns, and the function searches the text for matches. Regex is reliable when the input follows expected formats, but it may be less effective when handling misspellings or more complex structures.

EMAIL

Pattern: [a-zåäöA-ZÅÄÖ0-9]+[\._]?[a-zåäöA-ZÅÄÖ0-9]+[@]\w+[.]\w{2,3}

Examples (matches):

[email protected]

[email protected]

✅ någon@exempel.åäö

[email protected] Examples (non-matches):

❌ @example.com

[email protected]

❌ user@examp

[email protected]

SOCIAL NUMBER

Pattern: (19|20)([0-9]{4,6})([-+]|\s)?([0-9]{4})|([0-9]{2})([0-1][0-9][0-3][0-9])([-+]|\s)?([0-9]{4})

Examples (matches):

✅ 19900101-1234

✅ 199001011234

✅ 900101-1234

✅ 9001011234

✅ 18991231+5678

CREDIT CARD NUMBER

Pattern: (4\d{3}|5[1-5]\d{2}|6011)(-?\s?)(\d{4})(-?\s?)(\d{4})(-?\s?)(\d{4}|3[4,7]\d{13})

Card type

Begins with

Digits

Example

Visa

4xxx

16

4123-5678-9012-3456

Mastercard

51-55xx

16

5214-5678-9012-3456

Discovery

6011

16

6011-5678-9012-3456

American Express

34,37

15

371234567890123


This is a bidirectional, encoder-only Transformer model trained to detect and redact personally identifiable information (PII) from multilingual text. It goes beyond simple pattern matching and can identify entities in more complex or unstructured formats.

Entities detected by this model include:

  • AGE

  • BUILDINGNUM

  • CITY

  • CREDITCARDNUMBER

  • DATE

  • DRIVERLICENSENUM

  • EMAIL

  • GENDER

  • GIVENNAME

  • IDCARDNUMBER

  • PASSPORTNUM

  • SEX

  • SOCIALNUM

  • STREET

  • SURNAME

  • TAXNUM

  • TELEPHONENUM

  • TIME

  • TITLE

  • ZIPCODE

Swedish Anonymiser Model

🔗 RecordedFuture/Swedish-NER · Hugging Face

This model is similar to the Multilingual Anonymiser but specifically trained on Swedish data.

Entities detected by this model include:

  • LOCATION

  • ORGANIZATION

  • PERSON

  • RELIGION

  • TITLE

Last updated

Was this helpful?