--- license: mit datasets: - ai4privacy/open-pii-masking-500k-ai4privacy language: - en tags: - pii - redaction - anonymisation - english model-index: - name: english-anonymiser-openpii-ai4privacy results: - task: type: token-classification name: PII Masking dataset: type: ai4privacy/open-pii-masking-500k-ai4privacy name: Open PII Masking 500K split: english-validation metrics: - type: f1 value: 0.9882 name: F1 Score - type: precision value: 0.9882 name: Precision - type: recall value: 0.9883 name: Recall - type: accuracy value: 0.9917 name: Accuracy metrics: - f1 - precision - recall library_name: transformers pipeline_tag: token-classification --- # English Anonymiser OpenPII (Ai4Privacy) This model is designed to **redact Personally Identifiable Information (PII)** from English text. It has been fine-tuned exclusively on the English subset of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. --- ## Evaluation Metrics The table below summarizes the detailed evaluation results per PII label: | **Label** | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** | |--------------------|:------:|:------:|:------:|:------------:|:-------------:|:----------:|:-------------:| | SURNAME | 3724 | 0 | 26 | 99.31% | 100.0% | 99.31% | 99.65% | | O (Non-PII) | 0 | 368 | 0 | 99.36% | n/a | n/a | n/a | | TIME | 1934 | 0 | 2 | 99.90% | 100.0% | 99.90% | 99.95% | | DRIVERLICENSENUM | 505 | 0 | 2 | 99.61% | 100.0% | 99.61% | 99.80% | | PASSPORTNUM | 566 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | | GIVENNAME | 7557 | 0 | 163 | 97.89% | 100.0% | 97.89% | 98.93% | | TELEPHONENUM | 3637 | 0 | 4 | 99.89% | 100.0% | 99.89% | 99.95% | | BUILDINGNUM | 418 | 0 | 8 | 98.12% | 100.0% | 98.12% | 99.05% | | AGE | 164 | 0 | 5 | 97.04% | 100.0% | 97.04% | 98.50% | | DATE | 2335 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | | CITY | 1717 | 0 | 85 | 95.28% | 100.0% | 95.28% | 97.58% | | TITLE | 363 | 0 | 21 | 94.53% | 100.0% | 94.53% | 97.19% | | IDCARDNUM | 2008 | 0 | 12 | 99.41% | 100.0% | 99.41% | 99.70% | | GENDER | 120 | 0 | 1 | 99.17% | 100.0% | 99.17% | 99.59% | | CREDITCARDNUMBER | 555 | 0 | 3 | 99.46% | 100.0% | 99.46% | 99.73% | | SEX | 77 | 0 | 2 | 97.47% | 100.0% | 97.47% | 98.72% | | STREET | 1379 | 0 | 8 | 99.42% | 100.0% | 99.42% | 99.71% | | TAXNUM | 343 | 0 | 14 | 96.08% | 100.0% | 96.08% | 98.00% | | EMAIL | 2607 | 0 | 1 | 99.96% | 100.0% | 99.96% | 99.98% | | SOCIALNUM | 421 | 0 | 1 | 99.76% | 100.0% | 99.76% | 99.88% | | ZIPCODE | 418 | 0 | 8 | 98.12% | 100.0% | 98.12% | 99.05% | **Overall Evaluation:** - **Accuracy:** 99.17% - **Precision:** 98.82% - **Recall:** 98.83% - **F1 Score:** 98.82% - **Total True Positives (TP):** 30,848 - **Total False Positives (FP):** 368 - **Total False Negatives (FN):** 366 **Macro-Averaged Metrics:** - **Accuracy:** 98.56% - **Precision:** 95.24% - **Recall:** 93.83% - **F1 Score:** 94.52% --- ## Model Behavior & Limitations - **Evaluation Focus:** The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact support (at) ai4privacy.com --- ## Disclaimer This model card details the evaluation metrics and fine-tuning parameters for the English anonymiser. **Please note:** - The model is provided **as-is** under the MIT License. - It is intended solely for redaction purposes and does not perform full PII classification - Users should carefully test and evaluate its performance on their own data before deploying in production environments. --- *Ai4Privacy – Committed to protecting personal data in the age of AI.*