Case Study
Classification and Data extraction from Identity documents
Executive Summary
A UK-based client working with various government agencies faced
critical challenges in verifying identity documents submitted by citizens for public services.
The manual document verification process was time-consuming, error-prone, and lacked
scalability. To solve this, we implemented an AI-powered solution that automates the
extraction and classification of information from identity documents, including verifying
handwritten SRA numbers provided by solicitors. By integrating Google Vision API and advanced
AI algorithms, the client now benefits from improved accuracy, operational efficiency, and
enhanced fraud detection.
Problems faced by Client
The client was responsible for verifying identity documents
submitted by the public to access essential government services. Their existing process was
entirely manual, leading to several operational challenges:
•Human Errors: Manual data entry and document verification were prone to
mistakes.
•Time-Consuming Processes: High volume of documents led to delays in service
delivery.
•Inconsistency: Varying document formats and handwritten annotations reduced
reliability.
•Fraud Risk: No standardized way to verify solicitor-attested documents,
increasing the risk of
forged submissions.
Proposed Solutions
To address these issues, we developed an AI-based document
verification system with the following capabilities:
•Document Classification: Automatically identify and categorize document
types (e.g., passport,
driving license, council tax bill).
•OCR & Handwriting Recognition: Extract text, including handwritten SRA
numbers, from
scanned
document images.
•Data Verification: Cross-check extracted SRA numbers against third-party
databases such as
the
Solicitors Regulation Authority (SRA) registry.
•Compliance Management: Ensure GDPR compliance through appropriate use of
cloud services and
data residency options.
How It Works
1.Image Input: The user uploads scanned identity documents into the system.
2.Data Extraction: Google Vision API processes the image, applying Optical
Character
Recognition (OCR) and AI-based classification.
3.SRA Number Detection: AI algorithms detect handwritten SRA numbers where
applicable.
4.Verification: The extracted SRA number is validated against an authorized
solicitor
verification portal to confirm authenticity.
5.Output Generation: The system compiles extracted data including:
First
and last name
Date of birth
Address
Document identification number
Document type classification
6.Final Validation: Verified and structured data is forwarded to the
government service portal
for processing.
Technologies Used
•Google Cloud Vision API : For advanced OCR and document structure
recognition.
•AI/ML Algorithms : For document classification and handwritten text
recognition.
•Cloud Services : For scalable and secure processing with data residency
compliance.
•Third-party API Integration : A third-party SRA identity verification
website used this AI
model.
•Coding Language: Dot Net/HTML.
Client Benefits
•Increased Accuracy: Significantly reduced errors in data extraction and
document validation.
•Faster Processing: Cut down verification time from hours to minutes,
enabling
quicker service
delivery.
•Fraud Prevention: Reliable SRA verification ensures solicitor legitimacy and
prevents document
forgery.
•Operational Efficiency: Automated workflows reduce dependency on manual
staff
and
enable higher
throughput.
•Regulatory Compliance: Ensures GDPR compliance with careful handling of
personal and sensitive
data.
•Scalability: Easily accommodates growing document volumes without
compromising
performance.
Sample Demo Images to display the Extraction Output
Photocopy image with SRA Number
Details extracted – “Extracted Text: UK DRIVING LICENCE 1. FRITH 2. ASHLEIGH JAMES ERNEST 3. 02.03.1987 UNITED KINGDOM 4a.28.05.2021 4c. DVLA 4b.27.05.2031 5. 7 FRITH803027AJ9EN 97 FRITH 23 REDLAND WAY, AYLESBURY HP21 9RJ 9. AM/A/B/f/k/q 米 Atique Mazher SRA number: 51873 Labels: Identity document (95.85%), Paper Product (77.34%), License (60.42%), Document (58.70%) Colors: RGB(91,84,77) – 25.27%, RGB(192,188,185) – 23.62%, RGB(163,158,152) – 19.13%, RGB(62,54,49) – 15.04%, RGB(120,114,107) – 13.28% Landmarks: No landmarks detected Logos: Union Jack (78.18%) Safe Search: Adult: VERY_UNLIKELY, Violence: VERY_UNLIKELY, Racy: VERY_UNLIKELY”