![](https://blog.emergingscholars.org/wp-content/uploads/2025/02/comparison_1739330082-420x280.jpeg)
Content warning: discussion of substance use and overdoses
“Find the differences between these two pictures” is a classic children’s activity–and now a popular meme format from The Office. Maybe sometimes it is just something to the pass the time, but one can certainly imagine that it’s good practice for focused attention and visual discernment, not to mention basics like counting and color recognition for the younger kiddos. Sometimes it can be a job for adults, too. Such was the nature of the task I was given a few weeks ago: compare two sets of classifiers for identifying overdose-related emergency department visits and enumerate the differences.
Developing and maintaining all sorts of classifiers along those lines are a major part of my job. In principle, there are only a limited number of reasons why folks seek healthcare at the emergency department. But influenza infections and myocardial infarctions and overdoses can present in multiple ways, depending on factors like the patient’s age and sex. Then there is the fact that patients can describe the same experience in different ways. For example, some patients with a personal or family history of heart disease might use more precise language while others might only be able to describe their symptoms informally. What we receive is a text record (“chief complaint”) of that information provided by the patient–as entered by a hospital employee who may make further editorial choices, not to mention spelling mistakes, when typing the entry.
As a result, if I want to identify all visits related to overdoses, I can’t just look for the word “overdose.” I also have to look for “poisoning” (but not food poisoning) and “od” (but not when it means “oculus dexter” or right eye), and for symptoms of overdoses like “patient unresponsive” (ideally in combination with a reference to substance use), and for indicators of overdose treatment like “administered narcan.” And our epidemiologist clients would like to know the substance(s) involved, so I also need to look for terms like “cocaine” but also informal terms like “crack” (but not cracks in ribs, sidewalks, or teeth) and “coke” (but not the soft drink). And so on.
Very quickly, we have gone from just a list of words that are easy to sort and compare to complex logic such as “this AND that BUT NOT this other thing” or “this IF IT IS CLOSE TO that.” The difficulty there is now we have multiple ways to express exactly the same logic. Just looking at the written logic side-by-side won’t necessarily make this equivalence apparent; the best way to identify functionally identical statements is to apply the logic to the data and compare the results. This also has the benefit of allowing us to assess the relative impact of differences. One classifier might have a significant number of unique logical expressions, but if the phrasings they match rarely or never appears in practice, those differences don’t matter much.
So, instead of just comparing the text of the logical expressions in the classifier definitions, I focused on comparing the results. The most straightforward way to do this is to look at the level of the full chief complaint text to see which ones the classifiers both identify and which ones are only identified by one or the other. Reviewing these lists reveals some general trends, but unfortunately the chief complaints for drug overdoses can venture into narrative territory, with details of amounts and routes of ingestion and sequences of events leading up to and following the realization that the patient needed medical attention. The unique features can obscure the commonalities of the classifier definitions we are trying to find.
My next thought was to break the text down into individual words and compare how often different words appear in the different groups, highlighting words that are frequent in one group but appear infrequently or not at all in the other. This does a pretty good job of highlighting the differences that come up often. For example, one definition looking for benzodiazepine overdoses includes “benzo” as a recognized abbreviation while the other does not; that abbreviation appears a couple hundred times in the data, so it stands out as common in one group and rare in the other. When we get to the differences that come up less often, however, the true differences get lost in a lot of noise.
![](https://blog.emergingscholars.org/wp-content/uploads/2025/02/comparison_1739331440-420x630.jpeg)
Using word frequencies in this way works best if we can treat sentences as if they are constructed by pulling words one at a time out of a hat. For some common words that get used in many different contexts, we can just about get away with pretending this is true. But some words just go together like peanut butter & jelly, so if you see one, you’ll probably see the other. And in a finite sample of texts like this, some uncommon words will just happen to occur together in the same chief complaint even though they are unrelated. The result in these cases is that my analysis can’t tell the difference between the words that show up in one group and not the other because they represent differences in the definitions and the words that show up in one group and not the other because they are either semantically or grammatically connected to the words in the definition or they just happened to show up together. This confounding because most apparent to me when words were getting flagged that did not appear in either definition.
I also noticed words getting flagged as associated with one definition even though it appeared in both definitions, such as ‘Ativan’. This turned out to be due to the fact that the definitions can have logic for inclusion and exclusion. While both definitions for benzodiazepine overdose included Ativan, one definition also excluded chief complaints if they mentioned that the hospital staff had administered Ativan as a treatment while the other definition did not. As a result, there were a number of chief complaints classified by the second definition but not the first that included the word Ativan. The analysis was correctly identifying that there were differences connected with this word, but not making it clear whether this was due to inclusion or exclusion criteria. It also did aid in recognizing that it was certain phrases that mattered, not just a single word.
Tweaking this frequency analysis, for example by including all chief complaints and not just those classified by either or both definitions, could help in some cases. Primarily, it could substantially increase the pool of chief complaints, causing more words to behave as if they were pulled from a hat and cutting down on the number of words that just happened to appear together the one or few times they appeared at all. But it also meant nearly all the words from each definition got flagged, even the ones they had in common, requiring additional, more complicated filtering rules. In theory, it also provides more data to allow for looking for uncommon pairs of words, but in practice that bumped up against the limits of my computational resources.
Back to the drawing board again. The only trick I had left was one I was trying to avoid because I expected it to be fairly tedious and not yield results that were any more usable–but what do I know? This trick involved splitting the definitions up into separate “atoms” of inclusion and exclusion criteria, applying them to the complaints in each group and seeing which words or phrases get pulled out, then identifying the unique items pulled out by each definition. In principle, this isn’t guaranteed to work; the definitions can be arbitrarily long and complex such that separating out inclusion and exclusion criteria is not meaningful and/or the “atoms” might not be useful on their own. In this particular case, however, separation was possible and largely a task that could be automated. And the atoms generally contained word combinations when they need to, such as “crack” or “coke” and the words that need to go with them to differentiate substance use from injury or soda consumption. The resulting output is still a lot to review, but interpretation is clearer–e.g. I can clearly flag when a word or phrase is flagged because one definition excluded it or the other included it–and there are no words flagged that aren’t actually in the definitions.
This iterative process has been the work of several weeks; it has been my main but not only project since the start of the year. So imagine my surprise and skepticism when, in half as much elapsed time (although obviously more person-hours), entire long-standing government departments and programs have been effectively shut down and core processes of other departments overhauled or disrupted. In such a short period of time, how can the consequences of these substantial changes to complex systems be assessed? How can we determine that the new arrangements are suitably equivalent or better? This hardly seems conservative. And given that the observed immediate consequences are loss of access to food and live-saving medicine, it is difficult to describe as compassionate either. I appreciate that these situations are still evolving and the worst consequences may yet be avoided; at the same time, this level of churn, confusion and disruption is hardly efficient, was fairly predictable, and could have been avoided with a more measured and considered approach.
Andy has worn many hats in his life. He knows this is a dreadfully clichéd notion, but since it is also literally true he uses it anyway. Among his current metaphorical hats: husband of one wife, father of two teenagers, reader of science fiction and science fact, enthusiast of contemporary symphonic music, and chief science officer. Previous metaphorical hats include: comp bio postdoc, molecular biology grad student, InterVarsity chapter president (that one came with a literal hat), music store clerk, house painter, and mosquito trapper. Among his more unique literal hats: British bobby, captain’s hats (of varying levels of authenticity) of several specific vessels, a deerstalker from 221B Baker St, and a railroad engineer’s cap. His monthly Science in Review is drawn from his weekly Science Corner posts — Wednesdays, 8am (Eastern) on the Emerging Scholars Network Blog. His book Faith across the Multiverse is available from Hendrickson.