If you are still scanning to a network folder in 2026 using software designed around 2020, you are not really digitising anything. You are just moving paper problems into a server.
Legacy capture software creates pictures of information, not usable data. That difference feels academic until you look at security, compliance, AI readiness and payroll time side by side.
Here is where the real cost shows up.
Why is SMBv1 such a serious risk in scanning workflows?
Because it is a known open door, not a theoretical weakness.
Many older scanners rely on SMBv1 to push files onto a server. That protocol is the same one exploited by WannaCry, which is why Microsoft disables it by default in modern Windows builds.
What we often see in practice is IT re-enabling SMBv1 “temporarily” to keep a scanner working. That temporary fix quietly becomes permanent. At that point, the scanner is no longer just outdated. It is a direct attack path to your file server.
How does old scanning software create “dark data”?
By flattening everything.
A legacy scan produces a pixel image wrapped in a PDF. To a system, an invoice, a contract and a photo of a whiteboard all look identical. There is no text layer, no structure and no metadata.
That matters because modern automation and AI tools can only act on text. If the content is locked in pixels, it is effectively invisible to the systems you are paying to be “digital”.
Why does this break AI initiatives so quickly?
Because agentic workflows need something to read.
If you want software to extract totals, VAT numbers or due dates, the scan must contain machine-readable text. Flat PDFs stop that at the first step.
One thing we have noticed on the ground is that teams invest in AI tools and then quietly avoid using them, because half the document archive is unreadable. The software is blamed, but the root cause is capture quality.
Where does GDPR risk creep in?
Searchability.
Under UK GDPR, subject access requests and the right to erasure require you to know what data you hold and where it lives. If documents are not OCR’d, you cannot reliably search inside them.
That creates an awkward position. You may be legally responsible for data you cannot even locate. During an audit, that is not a comfortable conversation to have.
What is the real cost of “human middleware”?
It is paid every month.
If someone scans a document, opens it, reads the numbers and manually re-keys them into Xero or Sage, you are paying skilled staff to bridge a gap software should have closed years ago.
Modern capture tools extract the data at source and push both the numbers and the document to the right place automatically. Removing that manual step routinely saves 15 to 20 hours a month for a finance role. That saving compounds quietly.
What does “modern” capture actually change?
It changes the output, not just the interface.
Tools like Agility, produce searchable PDFs plus structured data.
That single shift turns a scan from a static record into something systems can act on.
How do the two approaches really compare?
| Feature | Legacy scanning (2020 era) | Modern intelligent capture (2026) |
| Output | Flat image PDF | Searchable PDF plus structured data |
| Security | Often relies on SMBv1 | TLS, SFTP or cloud APIs |
| Searchability | Filename only | Full text content |
| Workflow | Dumps to a folder | Routes into specific systems |
| AI readiness | None | Designed for automation |
A simple test you can run today
If I were checking this internally, I would start with one file.
Open a PDF scanned last week. Try to highlight a line of text with your mouse. If the whole page highlights as a single block, that document is effectively dark data.
At that point, upgrading the scanner hardware misses the point. The value sits in upgrading the capture process.
Turning scans into assets, not archives
If I were weighing this up, I would stop thinking in terms of devices and start thinking in terms of outputs. A modest scanner paired with intelligent capture software often delivers more operational value than an expensive photocopier producing unusable files.
Once scans become searchable, secure and machine-readable, everything downstream works harder for you. Until then, you are just building a larger, more expensive digital landfill.
Try Agility today.



