Industry leader specializes in Data Breach Response, PII & PHI Detection, Data Security, Sensitive Information Detection, Privacy, Data Subject Access Requests, Incident Response, Data Protection, GDPR, CCPA, Incident Response, and Cybersecurity.
This case study demonstrates the strategic use of various AI/ML models to create a comprehensive solution. Project sought to automate the classification, organization, and information extraction of large volumes of documents and images.
1 Product Manager
1 UX/UI
2 Backend Dev
7 Frontend Dev
3 Data Sciences
4 QA
2 Cloud Infra & DevOps
2 Engineers
Geographically Metrics
India, Brazil, U.S.
5 years+
1. Data Sensitivity: Securing data was challenging. Models were deployed in a secure cloud environment, with access controls and encryption protocols.
2. Accuracy Demands: Fine-tuning each model was labor-intensive but was addressed by leveraging synthetic datasets and manual labeling.
3. Handling Diverse Document Formats: Complex layouts posed challenges for document parsing. Using a combination of Donut, Textract, and PaddleOCR, the system achieved robust adaptability across formats.
Classify Images and Documents based on their content and purpose (e.g., medical forms, invoices, insurance claims, bank statements, government ids, tax forms).
Detect Objects within images, particularly government ids like ssn, passports, drivers licenses, to identify relevant sections.
Extract PII Information securely, ensuring compliance with regulatory standards like HIPAA and GDPR.
Streamline Workflow Automation by enabling seamless integration across the AI/ML models.
1. Data Ingestion & Preprocessing
5. PII Detection and Redaction
High Accuracy Classification: Custom-tuned Donut and BERT models achieved classification accuracy of over 95%, enabling faster sorting and processing.
Efficient Object Detection: YOLO detected objects with 97% accuracy, significantly improving the workflow where section detection in medical imagery was needed.
Robust PII Extraction and Redaction: The combination of NER, PHI-3 Vision, and Azure Document AI enabled 98% accurate PII detection, ensuring regulatory compliance.
Reduced Processing Time: Automation with OCR and AI reduced document processing time by 70%, freeing up resources.
Improved Data Security: Custom models for PII recognition ensured data privacy standards were upheld, reducing data breach risks.
Join our global team of top tier talent
Find your next career opportunity at Athenaworks.