Development Blog: Custom Image Vectorizer for TCG Card Recognition

Custom Image Vectorizer for TCG Card Recognition

Nov 14, 2025

If you missed the previous couple of posts, here’s a quick recap: several approaches to recognizing MTG cards were tested, but none delivered the right balance between performance and accuracy.

We’re still running everything on a Raspberry Pi 5 (8 GB) without any TPU or GPU, aiming to stay within 0.5 seconds from photo capture to card detection and identification, complete with the sorting properties.

So I’ve already jumped headfirst into the rabbit hole of image vectorization.

For storing reference vectors of MTG cards, I’m using PostgreSQL + pgvector. Our own benchmarks show that finding the 30 nearest vectors (dim = 2048) in a database of 128 000 entries takes about 0.36 seconds.

That leaves roughly 0.14 seconds for the recognition itself — which sounds unrealistic.

So naturally… challenge accepted — let’s dive even deeper down the rabbit hole! ;)

What Vectorization Really Means

Very simply put, vectorization is the process of forming a sequence of numbers, where each number reflects a single property of an object. The higher the value, the stronger that property is expressed.

In a sense, it can be seen as a spectral signature of the object, or as the coordinate of a point in space — and if you draw a line from the origin to that point, you get a vector. That’s exactly what gives the process its name: vectorization.

In practice, however, the values of a vector’s elements may represent not just one but several combined features — often learned by the model — and they can be both positive and negative.

An object can be anything: a word, a phrase, a book, an image, a product, a city or even a planet. And in our case, that object is a photo of an MTG card, or any other TCG card.

In the world of machine learning, such a vector is often called an embedding — it sounds smarter, but it’s the same thing. An embedding is a vector produced by a trained model, but if a vector was obtained in any other way, it has no right to call itself an embedding. In other words: “Not every vector is an embedding, but every embedding is a vector”. ;)

A Simple Example of Card Vectorization

Imagine that each card has only two features — its average visual “redness” and “blueness”. We’ll take a few cards and roughly estimate, by eye, how red and how blue they are, assigning values between 0 and 1. Now each card has its own coordinates, so we can place them on a plane for visualization.

Take a look at the illustration: you’ll see that several reddish cards cluster close together, while a few bluish ones form another group. Somewhere in between, there’s a violet card — a mix of red and blue.

We can easily tell the violet card apart from the others by distance, and we can distinguish the two main clusters from each other. However, within each cluster the distances are too small to be sure of anything.

If we want to tell apart the cards inside these merged clusters, we’ll need to add one more feature, and then another, and another — until all those overlapping cards are finally pulled further apart.

With each new feature, we increase the dimensionality of the space — moving from two dimensions to many. It quickly becomes something the human brain can’t easily grasp. Fortunately, we don’t have to: all we need is to compute the distances between points in this multi-dimensional space, and good old Earth mathematics handles that just fine. ;)

Evaluating Distances Between Vectors

To move forward with development, we first need clear criteria for evaluating the distances between card coordinates in the vector space — in other words, how well we manage to pull them apart.

I thought it would be a good idea to perform augmentation for each card — a procedure that generates several slightly modified versions of the same image. In our case, we don’t need zooming or rotation; shifting the white balance and applying small translations (left-right, up-down) will be enough. These mutations produce 72 augmented variants from a single image.

We’ll then vectorize all of them and look at the cloud of points they form. To make the concept easier to visualize, we can simplify it to a two-dimensional space. The outermost points define a useful parameter: D_aug, the augmentation diameter.

I might be reinventing the wheel here… but I actually enjoy the process, and since I couldn't find any ready-made solution for recognizing TCG cards, I’ll just keep on keeping on. ;)

And now, by comparing distances between different cards and the D_aug value, we can objectively detect suspiciously close “merged” cards — the ones that expose weak spots in our vectorizer and point the way toward further improvements.

Card Vectorization in Practice

So, the task is to “invent” a transformation that converts the original card images into vectors in such a way that the distances between these vectors remain greater than D_aug.

Ideally, we’d like to keep pushing those vectors further apart while also shrinking D_aug — but in practice, D_aug tends to grow just as fast, meaning most “improvements” turn out to be… well, not that useful. ;)

Only occasionally I managed to find a really good combination of filters, and to make such cases easier to spot, I usually normalized D_aug to 1.0 — it also looks nicer on the charts.

A good indicator of progress can be the ratio Dist_med / D_aug. By comparing this ratio before and after each change, it becomes possible to evaluate how successful the modification actually was.

On the screenshot, you can see the “heatmaps” of distances — one showing distances between vector centers, and another showing distances between their boundaries. The diagonal appears black because, of course, the distance between a vector and itself is zero.

The distances between the boundaries are calculated as the center-to-center distances minus D_aug. That means any negative values indicate overlapping augmentation clusters — these are shown in red on the heatmap.

In practice, each card has a different level of stability under augmentation and its own D_aug, But calculating them all for 100K+ cards would be too expensive, so a reasonable compromise is to use the median D_aug value computed from a random sample of about 500 cards.

Additionally, you can see a plot showing the distribution of all pairwise distances, and below it — several card pairs with the smallest distances among all.

The first two pairs look visually similar, with distances ≤ 1.0, which is exactly what we would expect. The remaining pairs have distances greater than 2.0, meaning they are visually very different — and again, that matches our expectations.

To get all this beauty, every possible pair of distances must be computed. The number of pairs is calculated by the formula (N² − N) / 2 — for example: 3 vectors → 3 distances, 4 → 6, 10 → 45, and so on (an almost quadratic dependency).

Keep in mind that later we’ll need to calculate distances for 100 K vectors, which comes to a bit over 5 billion pairwise distances — a clear hint of some upcoming RAM or VRAM problems… but that’s a story for another time. ;)

Preparing for Large-Scale Tests

At last, the time has come to move from small local tests with 100–500 cards to something much larger.

On my RTX 3080 (10 GB), I could process at most about 5000 cards in a single run — and yes, I actually did that — but now it’s time to start filling the database and working directly with it instead.

Vectorizing all 100K+ cards takes roughly 3.5 hours. That’s not exactly blazing fast, and the bottleneck is probably PostgreSQL running inside a Docker container.

Either way, I didn’t dig into it or try to optimize anything — the full vectorization needs to be done only once, and later updates will just add a few dozen new records at a time, which is hardly a problem.

An unbreakable argument! So be it. ;)

First Large-Scale Test

The first large-scale tests revealed several massive clusters — 200–300 cards each — and one gigantic cluster containing 930 cards. That was definitely not what I expected.

It turned out that those 930 cards had no actual images — just placeholder pictures saying “not yet available.” For now, these can be safely removed from the database until real photos appear.

The other clusters, however, turned out to be genuine surprises.

Examples on the screenshot:

#1, #2 – These cards are hard to recognize because most of the area is plain white with almost no distinct visual features. The small text area doesn’t help either — our vectorizer wasn’t trained to interpret text.
#3 – These cards share the same illustration — an MTG logo — covering about half of the image, with the rest being text. They genuinely have a lot in common, so it’s understandable that the vectorizer struggled.
#4 – Visually similar cards with low-contrast, noisy artwork. There were several such groups in different dominant tones — blue, red, and so on.
#5 – All the illustrations and other useful details are drawn with thin lines, which our vectorizer interpreted as text and simply ignored.

Interim Results of the First Major Test

For the first three groups, recognition without OCR is practically hopeless. So for now, I decided to remove those cards from the database and treat them as unknown during sorting to avoid misclassification.

As for the remaining groups — they clearly needed more work. And yes, that meant running the full 3.5-hour vectorization process several more times… ;) but in the end, it paid off — the fix that solved these groups ended up improving recognition quality across all cards.

Great things are best seen from afar

I really wanted to see the overall distribution of distances across all 100K+ MTG cards — how they relate to each other in the global vector space.

We can load all vectors into VRAM — that takes only about 0.3 GB (100K+ vectors, float16, 1680 dimensions). However, keeping all 5 billion distances in memory would be far beyond what VRAM can handle.

That leaves two choices:

Store temporary results on disk — slow but simple.
Process distances in batches and accumulate data directly into a histogram.

The second option looked far more promising, but it’s trickier to implement — there’s no way to verify the final histogram since we can’t compare it with the full dataset. The only reasonable approach was to develop both methods, test them on small subsets, and confirm that they produce identical results.

In the end, that worked: the batch-based version runs much faster and gives consistent, reliable data. It wasn’t an easy module to build — but definitely a useful one.

On the resulting plot, the median distance landed around 2.5, which is great since it’s well above D_aug = 1.0. Some pairs even show distances above 8.0, indicating very large visual differences.

More interestingly, there are 82 614 pairs of cards with distances below D_aug — a tiny fraction compared to 5 billion total, yet still notable.

Under the histogram, a few of the closest pairs are shown — they look nearly identical. Well, they’re probably reprints, which is good news — that’s something we can work with. Still, we need to check them all, and doing it manually is not an option — another solution is needed.

Automated Processing of Merged Vectors

We can iterate through all vectors in the database and, for each one, find all nearest neighbors within D_aug = 1.0. This makes it possible to detect chains of visually similar cards, which — by linking together — form complete clusters of similar cards.

Once the clusters are formed, we can compare their sorting attributes (name, type_text, color, mana_value, mechanic, and type). Within a single cluster, these values should be identical. If any differences are found, the cluster is considered “dirty”, and the mismatched cards must be excluded and flagged for manual inspection.

This relatively simple procedure should significantly reduce the number of cards that require manual review, while also helping to identify inconsistencies in the database describing sorting attributes.

After processing, we end up with 46 680 single cards that didn’t belong to any cluster — meaning they are either highly unique or were excluded as “dirty”.

The process also identified 18 240 reprints, the largest cluster containing 39 cards. However, it’s worth noting that this card was probably printed many more times — our vectorizer simply didn’t include some versions whose visual appearance differs significantly.

Distribution of Single Vectors

Now we’re ready to analyze the vector distribution again — this time using only single cards, meaning those that have no neighbors within D_aug.

Visually, the plot looks almost the same, but now, from 46,680 single cards, we get “only” 1 billion pairwise distances. What matters most is this: there are now just 1 033 pairs of cards with distances smaller than D_aug.

Since in this test we excluded all cards that had neighbors, we expected to have no pairs at all with a distance smaller than D_aug — and that would mean the vectorizer had done its job and the work was finished.

I’ve seen many tests before with tens, hundreds, or even thousands of pairs that weren’t visually similar to each other — each time, that meant a new cycle of experiments and improvements, but now there are only 34, and after a quick visual check, they all look identical.

Here it is — finally! That indescribable feeling when you realize you’ve actually reached the goal after a long, hard journey.

In other words, out of more than 100 K cards, only 34 pairs remain problematic for the vectorizer — everything else can now be recognized reliably and sorted correctly by the main sorting attributes: name, type_text, color, mana_value, mechanic, and type.

Taking a Closer Look at the “Dirty” Cards

Let’s write another small helper module to examine these “dirty” cards in more detail. Out of 34 pairs, we actually have 69 unique cards — no problem, that’s expected.

Here are a few examples from the screenshots below, which illustrate the following patterns:

In most cases, it’s not just a simple reprint — a single-sided card was later released as a double-sided version, which changed its name, mechanics, and even color (from one to two).
Less often, the mana cost and mechanics change for unclear reasons — it doesn’t look like a misprint, but it’s hard to tell why.
And in a few rare cases, there are genuine misprints, where a card has nothing in common with its cluster except for the artwork.

The only reliable way to recognize such cards would be to add text recognition of the small identifier printed in the bottom-left corner. However, as I mentioned earlier, OCR on the Raspberry Pi runs extremely slowly, and its error rate is too high to make this approach practical.

For now, we’ll simply keep these cards in the database, but during sorting, they’ll be routed to the “X” slot (Unknown) to avoid false matches. Their share is tiny, so this loss can be considered acceptable.

Performance Test on Raspberry Pi 5 (8 GB)

After running 200 different card images through the vectorizer, we got a median time of 0.0080 seconds per card — incredibly fast, almost unbelievable.

In the same test, I tried processing the same 200 images in a batch. On a GPU this usually gives a speedup proportional to the batch size, since operations run in parallel — but the Raspberry Pi has no GPU, so everything runs on the CPU. Surprisingly, the result was 0.0053 seconds per card, which means about 1.5× faster — no idea how exactly, but I’ll take it. ;) Batch processing probably won’t be needed in practice anyway — this was mostly a test for academic curiosity.

So, the original goal for the Raspberry Pi 5 (8 GB) was to stay within 0.14 seconds per card, and we achieved 0.008 seconds — which means only one thing: we’ve done far better than even the wildest optimistic expectations.

Ah, here it comes again — that feeling… mmm… somebody stop me! ;)

Summary

Well, that’s unexpected

Usually, preparing a blog post takes me about half a day — a full working day at most, but this one turned out massive, and no matter how hard I tried to shorten it, preparing all the materials unexpectedly took about three full working days.

If you’ve read all the way to this point — well, color me surprised and genuinely grateful. It’s nice to know you found this stuff interesting. Honestly, I’m not sure I would’ve made it this far. ;)

Of course, I’ll never know for sure, so here’s a virtual like from me anyway!

And if you just scrolled straight down to the bottom — that’s okay too, but you’re getting a slightly smaller thank you. Yes, exactly. You heard me. Oh, and for the scrolling — consider yourself punished: one like, one share, and a subscription! That’ll teach you. ;)

Vectorization Quality

The vectorizer performs well and successfully distinguishes MTG card images visually.

Out of more than 100 K cards, about half have no close neighbors and are recognized with complete accuracy.

The other half forms around 18 K reprint clusters, where each cluster contains cards that are visually similar or even identical — recognition within such clusters is difficult. This isn’t a problem when sorting by the main attributes (name, type_text, color, mana_value, mechanic, and type), but for sorting by price and rarity, we had to accept a small compromise: select the neighbor with the highest value, so that rare or valuable cards won’t be missed.

Perhaps later I’ll find a better solution for this situation, but for now, this approach works well enough to move forward.

Performance on Raspberry Pi 5 (8 GB)

The median vectorization time was 0.008 seconds per card, with a vector dimension of 1680.

A search query in PostgreSQL + pgvector over a table of ~100K entries takes about 0.26 seconds on average.

This means the database search is the main time cost, while the vectorization step itself is negligibly small — fast enough to ignore altogether.

What’s Next

The vectorizer is ready, the database is filled with vectors — so now it’s time to bring all this work onto the actual device, the GlideSorter.

That means a small code cleanup first: removing all those unfortunate experimental fragments and doing a tiny refactor — really, just a small one… well, who am I kidding. :) Please forgive me in advance.

Once that’s done, it’ll be time to start building the main recognition module, where the vectorizer will be just one piece of a much larger system.

It will:

receive a photo from the camera,
apply color correction,
detect the corners of the card,
crop the image accordingly,
run it through the vectorizer,
perform a database search,
and finally return the card’s identifier and sorting attributes.

The sorting process still won’t be active on the device yet — but from that point on, it will already know everything about every card that passes through it, bringing us one big step closer to full sorting.

To be continued...

← PostgreSQL+pgvector on Raspberry Pi 5 8GB RAM

Color Correction of Camera Images →

Want to support the project?
See how you can help →