Database & Analytics

Trading Card Database

This was a database and pricing system I built for a trading card business. It started with messy exported sales data and turned into a setup that made it a lot easier to track inventory, clean up intake, and actually look at pricing history instead of guessing.

PSA API

Barcode scan to auto-fill card details instead of typing everything by hand

BigQuery

Inventory moved out of spreadsheets and into a shared cloud database

Python

Regex and rule-based parsing to turn raw sale text into usable fields

The problem

The original sales data was messy in a very real way. A lot of the useful card info was jammed into one text field with no consistent format. Player name, set, serial number, grade, grading company, and sometimes even sale price were all mixed together depending on how somebody entered it.

That meant you really couldn't sort by player, filter by grade, or compare prices across similar cards in a reliable way. You had to read the raw text every time and piece it together yourself, which is fine for a handful of records but not for a growing inventory.

Before and after

These are the actual before-and-after screenshots from the cleanup process. Same data, but way more usable once the fields were split out and standardized:

Before: raw export

#	Product	Sale
4	[Seller A] - Kyler R&S Silver Patch /399	$3.00
5	[Seller A] - Stroud Season Ticket Mojo	$15.00
6	[Seller A] - Isaiah Spiller NT RPA /99	$6.00
7	[Seller B] - Achane Optic Holo RR	7
8	[Seller A] - Josh Allen Field Vision Donruss Elite Blue /25	$16
9	[Seller A] - Drake London Prizm Break No Huddle	$7.00
10	[Seller A] - Malik Willis Optic Illusion Black Scope RC /25	$14.00
11	[Seller A] - Cousins Prizm Blue Ice /99	$5.00
12	[Seller A] - Rashad White Obsidian //35	$7
13	[Seller B] - Mahomes Optic Holo Play Action	$5.00
14	[Seller A] - DHop Blue Prizm Building Blocks RC PSA 10	$21.00
15	[Seller A] - Olave Select White Rookie Patch	$15.00
16	[Seller A] - K9 Impeccable Auto /49	$25
17	[Seller A] - Moody Honeycomb Mosaic RC	$14.00
18	[Seller B] - Waddle Prizm Silver RC	$13.00
19	[Seller A] - Kamara Prizm Silver RC PSA 9	$21.00
20	[Seller A] - Rodgers Obsidian Equinox /25	16
21	[Seller A] - Stroud Wild Card Comix RC	$13.00
22	[Seller B] - Achane Prizm Silver Variation RC PSA 10	$44.00

After: structured fields

Seller	Player	Card	SoldFor	Grader	Grade	DateSold	OutOf
Seller A	Anthony Davis	Maestro	$26.00	PSA	9	2023-02-03	25
Seller A	AJ Brown	Gold Shimmer	$115.00	PSA	9	2023-02-03
Seller A	Anthony Edwards	Frequent Flyers	$80.00	PSA	10	2023-02-03
Seller A	Anthony Edwards	King of Cards	$52.00	PSA	10	2023-02-03
Seller A	Anthony Edwards	Rev RC	$34.00	PSA	9	2023-02-03
Seller A	Anthony Edwards	Rookie Selection	$69.00	PSA	10	2023-02-03
Seller A	Bol Bol	Orange Ice	$25.00	BGS	9.5	2023-02-03
Seller A	Bol Bol	RR	$54.00	BGS	9.5	2023-02-03	249
Seller A	Bones Hyland	Work in Progress	$49.00	PSA	10	2023-02-03
Seller A	Devin Booker	Silver Crusade	$9.00	SGC	9.5	2023-02-03
Seller A	Tom Brady	Bowman DC 2019	$62.00	PSA	9	2023-02-03
Seller A	Tom Brady	Press Proof Premier	$23.00	SGC	9.5	2023-02-03
Seller A	LeBron James	Get Out of the Way	$15.00	PSA	9	2023-02-03
Seller A	LeBron James	Gold Wave	$440.00	PSA	10	2023-02-03	10
Seller A	LeBron James	Promo RC	$106.00	PSA	10	2023-02-03
Seller A	LeBron James	Redemption Special	$74.00	PSA	9	2023-02-03
Seller A	Bron/Bosh/Wade/Melo		$120.00	PSA	8	2023-02-03
Seller A	Joe Burrow	DC Red RC	$54.00	PSA	9	2023-02-03
Seller A	Joe Burrow	Patch	$100.00	PSA	8	2023-02-03	10

Left: everything jammed into one field. Right: separate columns for Seller, Player, Card, SoldFor, Grader, Grade, DateSold, and OutOf.

What the system does

Once the data was structured, it stopped being just a messy export and started acting like an actual inventory and pricing system. You could look up what was in stock, see what was paid for a card, and compare it against similar sales without digging through raw text.

That sounds basic, but it changes a lot when you're dealing with a lot of cards across different players, sets, parallels, and grades. Instead of relying on memory, you have a clean record to work from, and that makes buy/sell decisions a lot more grounded.

How I cleaned the data

I used Python, Pandas, and a bunch of regular expressions to pull useful fields out of the raw title strings. Some parts were easy to match because they followed patterns, like grading info or serial numbers. Other parts needed lookup rules because the formatting was inconsistent or somebody used a nickname instead of a full name.

For example, something like "LeBron /99 Prizm PSA 10 Sold $320" needed to turn into separate values for player, card, serial number, grader, grade, and sale price. The hard part was dealing with all the variations, like fields showing up in different orders or missing entirely.

It definitely was not a one-pass cleanup. Most of the work was running the parser, checking the output, finding the weird cases it missed, and then tightening the rules until the results were solid enough to trust.

clean_cards.py

# Parse a sale title into a few fields I cared about most
import re
import pandas as pd

def parse_title(raw_title):
    cleaned = raw_title.strip()
    grade_match = re.search(r'(PSA|BGS|SGC)\s*(\d+(?:\.\d+)?)', cleaned, re.IGNORECASE)
    serial_match = re.search(r'/(\d+)', cleaned)
    price_match = re.search(r'\$(\d+(?:\.\d{2})?)', cleaned)

    return {
        'grader': grade_match.group(1).upper() if grade_match else None,
        'grade': grade_match.group(2) if grade_match else None,
        'serial_number': int(serial_match.group(1)) if serial_match else None,
        'sold_for': float(price_match.group(1)) if price_match else None,
    }

parsed = df['RawTitle'].apply(parse_title).apply(pd.Series)
df = pd.concat([df, parsed], axis=1)

PSA barcode autofill tool

Cleaning up the old data was only half the job. I also needed a way to keep new records clean going forward, because typing card details by hand is slow and that's usually where inconsistencies start creeping back in.

PSA graded cards already have a certification barcode on the slab, so I built a tool around the PSA API. Scan the barcode, pull back the official card details, and drop them right into the intake form. That cut down a lot of manual entry and made the data more consistent from the start.

I also bought barcode scanners for the team, so the intake workflow looked like this:

Scan the barcode on the PSA slab

PSA API returns the official card data: player, title, set, and grade

Form auto-fills with the PSA information

User adds business data: purchase price, seller, any notes

Click save and the card goes into the database with cleaner, more standardized fields

The biggest win here was consistency. If the card came from PSA, the naming and grade info came straight from their records instead of depending on how somebody typed it in.

PSA graded LeBron James trading card showing the certification barcode on the slab

Cloud storage and reporting

After cleaning everything up, I moved the inventory into Google BigQuery. SQLite worked fine while I was building and testing things, but it made more sense to put the shared dataset somewhere the team could query without passing files around.

Once the data was in BigQuery, it got a lot easier to answer normal business questions across the full inventory. Things like average sale price for a certain player and grade, or which cards had moved up the most since they were bought, went from being annoying to pretty straightforward.

The historical pricing side mattered too. Instead of going mostly off instinct, you could look at real sales over time and get a better read on whether a card looked underpriced, overpriced, or probably worth holding.

Example: tracking a card's value over time

Here's a simple example of what that looked like in practice. This is sample data, but it shows the kind of view I wanted: buy price, market movement, and what happened by the time the card sold.

2018 Panini Revolution LeBron James

PSA 10 · /99 · Los Angeles Lakers

+38.5% return

Bought at

$182

Sold at

$252

Return

+$70

Hold time

10 months

Sample data for illustration: purchase price, market movement, and final sale over a 10-month hold.

Tools used

Python Pandas SQL SQLite Google BigQuery PSA API Regular Expressions Database Design Barcode Scanner Integration Automation

What I learned

Real-world data is way messier than class examples. Cleaning it forces you to understand how the data is actually being entered, not how you wish it looked.

Regex is useful, but it only gets you so far if you are not checking real outputs. Having a way to review edge cases after each pass saved a lot of cleanup time.

API integrations are usually pretty manageable once you understand the auth flow and response format. The trickier part was dealing with scans that came back with no match or incomplete data.

Moving from SQLite to BigQuery was mostly a tooling and workflow change. The SQL is similar, but having a shared cloud dataset makes a big difference once more than one person needs access.

Structured data really does change decision-making. Once there was actual price history to look at, buying decisions got less random and more consistent.