Database & Analytics

Trading Card Database

This was a database and pricing system I built for a trading card business. It started with messy exported sales data and turned into a setup that made it a lot easier to track inventory, clean up intake, and actually look at pricing history instead of guessing.

PSA API
Barcode scan to auto-fill card details instead of typing everything by hand
BigQuery
Inventory moved out of spreadsheets and into a shared cloud database
Python
Regex and rule-based parsing to turn raw sale text into usable fields
The problem

The original sales data was messy in a very real way. A lot of the useful card info was jammed into one text field with no consistent format. Player name, set, serial number, grade, grading company, and sometimes even sale price were all mixed together depending on how somebody entered it.

That meant you really couldn't sort by player, filter by grade, or compare prices across similar cards in a reliable way. You had to read the raw text every time and piece it together yourself, which is fine for a handful of records but not for a growing inventory.

Before and after

These are the actual before-and-after screenshots from the cleanup process. Same data, but way more usable once the fields were split out and standardized:

Before: raw export
# Product Sale
4[Seller A] - Kyler R&S Silver Patch /399$3.00
5[Seller A] - Stroud Season Ticket Mojo$15.00
6[Seller A] - Isaiah Spiller NT RPA /99$6.00
7[Seller B] - Achane Optic Holo RR7
8[Seller A] - Josh Allen Field Vision Donruss Elite Blue /25$16
9[Seller A] - Drake London Prizm Break No Huddle$7.00
10[Seller A] - Malik Willis Optic Illusion Black Scope RC /25$14.00
11[Seller A] - Cousins Prizm Blue Ice /99$5.00
12[Seller A] - Rashad White Obsidian //35$7
13[Seller B] - Mahomes Optic Holo Play Action$5.00
14[Seller A] - DHop Blue Prizm Building Blocks RC PSA 10$21.00
15[Seller A] - Olave Select White Rookie Patch$15.00
16[Seller A] - K9 Impeccable Auto /49$25
17[Seller A] - Moody Honeycomb Mosaic RC$14.00
18[Seller B] - Waddle Prizm Silver RC$13.00
19[Seller A] - Kamara Prizm Silver RC PSA 9$21.00
20[Seller A] - Rodgers Obsidian Equinox /2516
21[Seller A] - Stroud Wild Card Comix RC$13.00
22[Seller B] - Achane Prizm Silver Variation RC PSA 10$44.00
After: structured fields
Seller Player Card SoldFor Grader Grade DateSold OutOf
Seller AAnthony DavisMaestro$26.00PSA92023-02-0325
Seller AAJ BrownGold Shimmer$115.00PSA92023-02-03
Seller AAnthony EdwardsFrequent Flyers$80.00PSA102023-02-03
Seller AAnthony EdwardsKing of Cards$52.00PSA102023-02-03
Seller AAnthony EdwardsRev RC$34.00PSA92023-02-03
Seller AAnthony EdwardsRookie Selection$69.00PSA102023-02-03
Seller ABol BolOrange Ice$25.00BGS9.52023-02-03
Seller ABol BolRR$54.00BGS9.52023-02-03249
Seller ABones HylandWork in Progress$49.00PSA102023-02-03
Seller ADevin BookerSilver Crusade$9.00SGC9.52023-02-03
Seller ATom BradyBowman DC 2019$62.00PSA92023-02-03
Seller ATom BradyPress Proof Premier$23.00SGC9.52023-02-03
Seller ALeBron JamesGet Out of the Way$15.00PSA92023-02-03
Seller ALeBron JamesGold Wave$440.00PSA102023-02-0310
Seller ALeBron JamesPromo RC$106.00PSA102023-02-03
Seller ALeBron JamesRedemption Special$74.00PSA92023-02-03
Seller ABron/Bosh/Wade/Melo$120.00PSA82023-02-03
Seller AJoe BurrowDC Red RC$54.00PSA92023-02-03
Seller AJoe BurrowPatch$100.00PSA82023-02-0310

Left: everything jammed into one field. Right: separate columns for Seller, Player, Card, SoldFor, Grader, Grade, DateSold, and OutOf.

What the system does

Once the data was structured, it stopped being just a messy export and started acting like an actual inventory and pricing system. You could look up what was in stock, see what was paid for a card, and compare it against similar sales without digging through raw text.

That sounds basic, but it changes a lot when you're dealing with a lot of cards across different players, sets, parallels, and grades. Instead of relying on memory, you have a clean record to work from, and that makes buy/sell decisions a lot more grounded.

How I cleaned the data

I used Python, Pandas, and a bunch of regular expressions to pull useful fields out of the raw title strings. Some parts were easy to match because they followed patterns, like grading info or serial numbers. Other parts needed lookup rules because the formatting was inconsistent or somebody used a nickname instead of a full name.

For example, something like "LeBron /99 Prizm PSA 10 Sold $320" needed to turn into separate values for player, card, serial number, grader, grade, and sale price. The hard part was dealing with all the variations, like fields showing up in different orders or missing entirely.

It definitely was not a one-pass cleanup. Most of the work was running the parser, checking the output, finding the weird cases it missed, and then tightening the rules until the results were solid enough to trust.

clean_cards.py
# Parse a sale title into a few fields I cared about most
import re
import pandas as pd

def parse_title(raw_title):
    cleaned = raw_title.strip()
    grade_match = re.search(r'(PSA|BGS|SGC)\s*(\d+(?:\.\d+)?)', cleaned, re.IGNORECASE)
    serial_match = re.search(r'/(\d+)', cleaned)
    price_match = re.search(r'\$(\d+(?:\.\d{2})?)', cleaned)

    return {
        'grader': grade_match.group(1).upper() if grade_match else None,
        'grade': grade_match.group(2) if grade_match else None,
        'serial_number': int(serial_match.group(1)) if serial_match else None,
        'sold_for': float(price_match.group(1)) if price_match else None,
    }

parsed = df['RawTitle'].apply(parse_title).apply(pd.Series)
df = pd.concat([df, parsed], axis=1)
PSA barcode autofill tool

Cleaning up the old data was only half the job. I also needed a way to keep new records clean going forward, because typing card details by hand is slow and that's usually where inconsistencies start creeping back in.

PSA graded cards already have a certification barcode on the slab, so I built a tool around the PSA API. Scan the barcode, pull back the official card details, and drop them right into the intake form. That cut down a lot of manual entry and made the data more consistent from the start.

I also bought barcode scanners for the team, so the intake workflow looked like this:

1
Scan the barcode on the PSA slab
2
PSA API returns the official card data: player, title, set, and grade
3
Form auto-fills with the PSA information
4
User adds business data: purchase price, seller, any notes
5
Click save and the card goes into the database with cleaner, more standardized fields

The biggest win here was consistency. If the card came from PSA, the naming and grade info came straight from their records instead of depending on how somebody typed it in.

PSA graded LeBron James trading card showing the certification barcode on the slab
Cloud storage and reporting

After cleaning everything up, I moved the inventory into Google BigQuery. SQLite worked fine while I was building and testing things, but it made more sense to put the shared dataset somewhere the team could query without passing files around.

Once the data was in BigQuery, it got a lot easier to answer normal business questions across the full inventory. Things like average sale price for a certain player and grade, or which cards had moved up the most since they were bought, went from being annoying to pretty straightforward.

The historical pricing side mattered too. Instead of going mostly off instinct, you could look at real sales over time and get a better read on whether a card looked underpriced, overpriced, or probably worth holding.

Example: tracking a card's value over time

Here's a simple example of what that looked like in practice. This is sample data, but it shows the kind of view I wanted: buy price, market movement, and what happened by the time the card sold.

LeBron James card
2018 Panini Revolution LeBron James
PSA 10  ·  /99  ·  Los Angeles Lakers
+38.5% return
Bought at
$182
Sold at
$252
Return
+$70
Hold time
10 months

Sample data for illustration: purchase price, market movement, and final sale over a 10-month hold.

Tools used
Python Pandas SQL SQLite Google BigQuery PSA API Regular Expressions Database Design Barcode Scanner Integration Automation
What I learned
Real-world data is way messier than class examples. Cleaning it forces you to understand how the data is actually being entered, not how you wish it looked.
Regex is useful, but it only gets you so far if you are not checking real outputs. Having a way to review edge cases after each pass saved a lot of cleanup time.
API integrations are usually pretty manageable once you understand the auth flow and response format. The trickier part was dealing with scans that came back with no match or incomplete data.
Moving from SQLite to BigQuery was mostly a tooling and workflow change. The SQL is similar, but having a shared cloud dataset makes a big difference once more than one person needs access.
Structured data really does change decision-making. Once there was actual price history to look at, buying decisions got less random and more consistent.