Card Recognition Begins

Oct 23, 2025

From Zero to Vision

I’ll admit — when this project was still just a “hobby toy” in its earliest form, I had zero experience with computer vision. I had never touched OpenCV, TensorFlow, or Torch. Still, I was eager to dive into the field — I just didn’t have a good reason to do so… until one of my good friends confessed how exhausting it was to sort his trading card collection.

At that moment, I felt confident about the hardware and software side — all familiar territory — except for computer vision. That made it both a new challenge and an exciting opportunity for me.

In this post, I’ll share some reflections and lessons from my first steps into image recognition. I’ll try not to overload it with technical details so that it stays interesting to everyone reading.

 

Setting the Goal

Reliable MTG card recognition from a photo — in under 0.5 seconds, out of 100,000+ reference images.

And the hardware — Raspberry Pi 5 (8 GB RAM), without any GPU or TPU add-ons.

I guess some of you are smiling already, thinking: “Sure… good luck with that!”. And you’d be right — it does sound unrealistic.

But honestly, when has that ever stopped us? ;)

And that’s exactly what makes it interesting.

 

Getting to Know OpenCV

I guess I wouldn’t be wrong to assume that everyone who ever started exploring computer vision took their first steps with OpenCV — and for good reason. It’s a great place to begin.

Unfortunately, the best I managed to get in terms of accuracy and performance balance was ORB/SIFT detection, running at around 0.05 seconds per comparison on my dev PC.

Sounds fine — until you realize that it needs to compare against 100,000+ reference images, which adds up to roughly 5,000 seconds. Every attempt I made to optimize the process didn’t come anywhere close to the desired 0.5-second limit. Other OpenCV methods could run much faster, but the detection accuracy just wasn’t there.

So I didn’t even bother running full-scale tests on the Raspberry Pi — my initial skepticism about hitting that half-second target had definitely grown… though not enough to make me give up just yet. ;)

This phase took me about a month. The results were disappointing, but it was still valuable experience — and a clear personal takeaway: feature-based methods work fine for dozens of cards, but they simply don’t scale to hundreds of thousands.

 

Optical Character Recognition (OCR)

For text recognition, I used Tesseract OCR, which allows adjusting the recognition accuracy — and that, in turn, directly affects execution time.

After adjusting the accuracy to fit within the 0.5-second limit, the results on the Raspberry Pi 5 were disappointing: many character recognition errors appeared in visually similar symbols such as B ↔ 8, I ↔ 1, H ↔ K, S ↔ 5, and so on.

Some cards also have text printed over bright or detailed artwork, which further increases the number of errors or makes text recognition nearly impossible.

Increasing the recognition precision also increases execution time — while some of the same mistakes still persist.

As a result, using OCR to reliably detect card identifiers printed in the lower-left corner proved to be an unreliable approach.

Moreover, many MTG cards don’t have these identifiers at all, and recognizing cards by their title text wouldn’t distinguish between multiple reprints of the same card — not to mention sets printed in non-English languages.

 

Discovering TensorFlow

This was the next step — natural and quite predictable, I guess. My first convolutional network was trained on a simple dataset of tulips and roses. And honestly, everything about it was great — both the process and the result!

However, for our purpose, things were very different: we only have one image per card, while proper model training requires a large and diverse dataset.

For reference, just 100,000 single card photos already take up a fair amount of disk space — and even more in RAM.

Now imagine augmenting each of them a hundred times and saving all results individually… the scale of that tragedy is beyond words. ;)

The solution was to develop a custom multithreaded image augmentor, which generated hundreds of augmented variants for each card, yet stored them all within a single file per card.

That way, the total number of files stayed the same — or could even be reduced if needed — while disk I/O remained efficient, and in some cases, even faster.

 

Building My Own Convolutional Model

Unfortunately, training a conventional convolutional model for MTG card recognition didn’t go as smoothly as I hoped. The model simply refused to learn properly — and soon I realized why: the cards share too many similar visual elements.

It took a couple of months of experiments and model tweaks before I finally managed to train a working version on 100 cards. The model achieved 98–100% confidence, successfully recognizing every one of those cards — even from real photos with shifted framing or white balance distortions. Gradually, I expanded the dataset to 2,000 cards, keeping detection accuracy around 95–100%.

However, further attempts to increase the model size quickly ran into hardware limits — my RTX 3080 had only 10 GB of VRAM, and that became the main bottleneck for scaling the model any further.

Recognition tests on the dev PC were fast, and after converting the “2K model” to TFLite, I ran performance tests on the Raspberry Pi 5 (8 GB). To my surprise, it ran perfectly stable — no crashes — and within 0.5 seconds, the model could complete about 50 inference passes. That meant that within the same 0.5-second window, it was possible to run multiple models about 50 times each, effectively covering a recognition range of around 100,000 different cards.

I was happier than ever — though slightly overshadowed by the thought of the long, tedious process of training 50 separate models.

That was the moment this stage of development came to an end — sometime in 2024, when I moved on to other parts of the project, leaving the year on a bright and optimistic note.

 

Rest Was Never an Option

The thought of having to train 50 separate models, each with 2,000 cards, and knowing that more than 90% of the total set would only be recognized by one of those models — well, that was a bit unsettling.

Ideally, it would be much better to train a single model on 100,000 cards, but for that I’d probably need something like a Quadro RTX 6000 (96 GB) — or several of them. Such an upgrade felt… let’s say, unjustified for now. ;) Renting external GPU power was another option, but a rather delicate one.

And then it hit me — what if I simply cut off the last layer of my “2K model”?

The output would no longer be class probabilities, but rather some numerical representation that could serve as an embedding vector — a kind of “fingerprint” of each card.

To check the idea, I ran a few tests on a smaller 100-card model with all its augmented images — and, in a way, it worked.

However, some points in the embedding space were so close together that their clusters overlapped.

I made a bold assumption: if I could manually tune the model’s structure and weights with finer control over what happens at each layer, maybe I could shrink the clusters and push them farther apart.

That, however, would mean stepping into a completely different training process — one that isn’t described anywhere, with no clear starting point or reference to follow.

 

Looks Like I Took a Wrong Turn...

At that stage, all my models were built and trained in TensorFlow — but when it came to exploring layers, visualizing weights, or fine-tuning them manually, the framework felt like it was fighting back. Everything took way more effort than it should.

So I turned my eyes to PyTorch. On paper, it looked like exactly what I needed — full control and flexibility.

The only catch? I had never used it before. So… easy? Not this time either. ;)

Building a small test model with random weights was simple enough — plenty of examples online — but I had no intention of training it.

Instead, I created my own tools to visualize layers, filter weights, and other small details. That part went smoothly.

Then I started manually playing with the filters — and things went downhill fast. Every step forward seemed to come with two steps back.

A mild crisis followed: I stopped everything for a week, trying not to even think about the problem.

Eventually, I decided to do what usually works — lie in wait until inspiration struck in some early morning. It was the summer of 2025, and we were in Italy — sun, sea, and beach — so honestly, I made peace with myself pretty quickly that this was, in fact, the best possible plan.

And if not... well, there was always Plan B: train 50 models with 2 K cards each. Easy! ;)

 

Finally — a Good Morning! ☕

Yes, it happens to me quite often — just a few days later, I woke up one morning bursting with ideas and immediately glued myself to the laptop. Everything finally started to click into place.

Of course, my wife, the beach, and the sea didn’t go unnoticed — you don’t joke around with that. ;)

There were still a couple of weeks left of our vacation, and while I could have postponed all the ideas “until we get home,” letting that creative momentum fade would’ve been an unforgivable luxury.

So I worked whenever I could reach my laptop — sometimes literally between swims.

As things progressed, my confidence in the new direction grew stronger — but so did a worrying thought: I needed a database to store and search embedding vectors by distance.

It had to handle vectors of 1–2K dimensions and at least 100K+ records efficiently.

The first candidate that seemed suitable was PostgreSQL with pgvector — but unfortunately, I couldn’t find any real benchmarks for running it on a Raspberry Pi 5 (8 GB).

 

 

What’s Next

This story focused only on the recognition module, which — in reality — took more than a year to develop.

I suppose it turned out a bit more personal than I originally planned… but that’s how it is. Sorry — I’m not rewriting it. ;)

Next time, I’ll switch gears to something more technical: a benchmark deep dive into PostgreSQL + pgvector running on a Raspberry Pi 5 (8 GB).

 

To be continued...

 


 

Want to support the project?
See how you can help →