Your lab results aren’t wrong. Your software might be combining them wrong.

I’m Fillip Kosorukov. I’m not a doctor and nothing here is medical advice. I’m someone who got serious about tracking his own bloodwork, pulled years of lab results into one place to look at the trends, and found that the hardest part wasn’t the biology. It was that my software couldn’t reliably tell when two results were the same test. The fragile part wasn’t the lab value at all. It was the software’s definition of “same.”

The trend that didn’t make sense

The whole reason to centralize your labs is the trend line. A single cholesterol number or vitamin level on its own tells you very little. The same marker measured every few months for three years tells you whether something is drifting, and how fast. That’s the entire payoff of owning your data instead of letting it sit in five different lab portals.

So I built myself a view that lined up each marker over time. And some of the trends were nonsense. A value would appear to drop sharply and then jump back, when I knew from the original reports that it had been stable. Some markers showed up twice in the same month with slightly different values, as if I’d been tested twice. Others I knew I’d had measured simply weren’t there at all.

The data wasn’t wrong. Every individual result, on its original report, was correct. What was wrong was the stitching — the logic that decided “this result from March and this result from June are the same marker, so put them on the same line.”

The same test isn’t labeled the same way twice

Here’s the part that surprised me. The same test does not arrive labeled the same way everywhere. LOINC exists precisely to make lab tests comparable across providers, and it helps — but it isn’t the clean universal key you’d hope for. The same underlying marker can show up under more than one LOINC code depending on the lab, the method, or the panel it rode in on.

Concretely: my cholesterol panel came back under one LOINC code when one lab ran it and a different code when another did. To me they were the same measurement of the same thing. To my software, keyed on the code, they were two different tests. So the merge split one continuous history into two half-empty ones — and in a couple of cases did the reverse, folding two genuinely different markers together because their codes lined up more neatly than the underlying tests did.

The software had been merging results by matching on the LOINC code. It seemed like the obvious key; it’s literally the field that’s supposed to mean “what test is this.” But that assumption is exactly where the phantom drops and the duplicate months were coming from. The original reports were correct. My choice of identity key was the bug.

Rebuilding around identity, not convenience

The fix was to stop trusting the convenient field. I rebuilt the whole thing around one canonical concept per marker — “LDL cholesterol” as a single internal identity, decided once — and mapped every incoming result onto it explicitly, using the code plus the test name, the units, and which lab and panel it came from, instead of hoping the LOINC codes would line up on their own. LOINC went from “the key” to one clue among several. Every plotted point also kept a path back to its original report, so I could always check it. When I rebuilt the history on that foundation, the phantom drops vanished, the duplicate months collapsed into single real data points, and the markers I knew I’d had measured reappeared where they belonged. A few hundred results, spread across several years, back on the right trend lines. (I’ve written about the broader tracking setup at fillipkosorukov.net if you’re building something similar.)

The trends that came out the other side were boring. The one or two markers that were actually drifting stood out clearly instead of being buried under noise the software had invented. For the first time the picture matched what my reports actually said, which is the only thing you want from a system like this.

This matters even if you never write a line of code

You probably won’t write merge logic for your own labs. But you might use an app that does it for you, and the lesson transfers directly: when you let a tool aggregate your health history, the most important and least visible thing it does is decide which results are “the same.” Get that decision wrong and it doesn’t throw an error. It shows you a clean, confident chart that happens to be fiction.

Any tool you trust with this is making that call invisibly, so it’s worth asking whether it gets it right. The most basic check: can you click any point on the chart and see the original report it came from? If not, you’re trusting a decision you can’t audit.


About the author

Fillip Kosorukov co-authored peer-reviewed behavioral-psychology research on protective behavioral strategies and motivational interviewing (Journal of Substance Use, 2023; PMID: 37275205) and holds a BS in Psychology (Summa Cum Laude) from the University of New Mexico. The same concern with measurement validity that runs through academic research is what sent him down this particular rabbit hole. He writes about applied decision-making and the systems people use to understand their own behavior and health. This article is about data integrity, not medical advice.

Elsewhere:
ORCID ·
Google Scholar ·
Scopus ·
Web of Science ·
ResearchGate ·
Academia.edu ·
LinkedIn ·
Substack ·
fillipkosorukov.net ·
fillipkosorukov.me