Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create model for dissecting the sig field into computable structured data #689

Open
cgreich opened this issue May 11, 2024 · 14 comments
Open

Comments

@cgreich
Copy link
Contributor

cgreich commented May 11, 2024

This is a placeholder, so a proposal could be worked out:

Instead of the free text sig field (which may even be in a non-English language), we need to have the actual frequency information. So, 2 mL tablets 3 times a day" would become something like:

sig_amount - numeric (2)
sig_unit_concept_id - concept (concept_id for liquid units, most often "mL")
frequency - numeric (3)
frequency_unit_concept_id - concept (concept_id for "day")

@marcel1334
Copy link

My suggestion would be to skip the sig_unit_concept_id attribute. Just like the existing quantity has the unit of the drug_strength denominator, we can use the same unit for the sig_amount. This will make calculations a lot easier since several numbers are all in the same unit.

@cgreich
Copy link
Contributor Author

cgreich commented May 13, 2024

Makes sense, @marcel1334. As I said, this is not thought through yet.

Also, we may need to cut frequency into numerator and denominator, to allow for "1 tablet every other day".

@marcel1334
Copy link

Its feature I wanted to bring up some years ago as well. So I'm happy to see that its on the table now. Thanks.

Regarding the "1 tablet every other day": in the Netherlands we have a special frequency table for the time-units. "per 2 days" is one of the possible units. Here we have 4 base attributes: frequency, frequencyunit, amount, amountunit (last one suggested to skip in CDM). And an additional a text attribute that can hold optional dose instructions. Example "ZN 1KDUB". ZN is code for "if needed" and 1KDUB means "first time double dose". But can also include codes for special dose schemas. Here in Netherlands its a simple text value, not database normalized but works good enough. Some of these special values are needed in the dose/day and/or duration calculations. Normalized would mean an extra table where every code is another record, but I think leaving the special codes out or use the text variant would work.
Because not all instructions will end up in structured form, I think we need to keep the current sig attribute, or rename this to source_sig.

@marcel1334
Copy link

Or do you mean to use "numerator and denominator" instead of the "timeunit" attribute? So that "1 per 2 days" is just 1/2. "2 times per month" is 2/30. and "5 times per year" is 5/365. In that case, we do not need a timeunit as a concept and a lookuptable to convert "per week" to 7 days in our calculations. Then we can simply calculate with the available numbers. I would vote for this.

There is a lot of dynamics in dose instructions. Putting everything in structured attributes will need a very complex set of attributes/tables. Another thing that I see in our data is "3-4 times per day 2-4 tablets". We take the averages and we simply calculate with these (in example 3.5x3=10.5 tablets per day). Some of the special dose codes we use in the ETL calculations are things like "first time double dose" or the famous "3 weeks and 1 stop week". Sometimes in our research we adjust for the "if need" part, but in most cases we ignore this. Other examples we ignore are "use before/after/during the meal", "use with water", etc. Two other very frequent used codes are: "as known" where we look for the dose instructions in previous prescriptions or "see product instructions" where we use the DDD from WHO. The original dose instructions including the details can be kept in the source_sig.

For our IPCI database we already extract the attributes from the incoming freetext instructions in our "non-CDM" database. We use in our research for many years). So from ETL point of view, we will be able to fill this for most our our drug_exposures.

So far some thought from our side. Looking forward to some additional attributes in the drug_exposure to store the dose instructions in a structured form.

Cheers, Marcel

@cgreich
Copy link
Contributor Author

cgreich commented May 13, 2024

Yes, all these need to be considered. There are:

  • The "1 tablets per 2 days". The question is do you store both the 1 and the 2, or do you just store 0.5.
  • The "2-4 tablets". We could ignore or leave them in, in which case I would draw the average and a plus/minus. Easier to operate with.
  • The "before meal". That has no effect on the dose (or it has, but we don't know what the patient really did), so I agree with ignoring them.
  • The "as needed". We may want a flag like that, because you could model it and come up with an estimation across all users.
  • The "3 weeks and stop for one". This is a bigger deal, because it is the norm for chemotherapy regimens. If we want to support that we would need a more complicated model. The question is whether this is data or reference data. I am not sure.
  • The verbatim text strings. I would relegate that to a sig_source_value field, because it is not useful in a network. Plus, we want to get rid of text fields in the OMOP CDM.

@MelaniePhilofsky
Copy link
Collaborator

If people are going to parse the sig field, shouldn't the data be put into the fields already in drug exposure table?

Example: take two pills twice a day for one week.
Quantity = 14
start date - end date = 7 days

The meaning is the same.

@cgreich
Copy link
Contributor Author

cgreich commented May 13, 2024

Totally. These are connected. Essentially: Days_supply = quantity / frequency per day.

The problem we are solving is that we get 2 out of 3 (sometimes 3 out of 3) in the data. In US prescriptions, it is days_supply and quantity. But in other countries that may be different, and you get the frequency (from the sig) and either days_supply or quantity. We want to have all such cases covered and always be able to calculate the dose.

It may also help with debugging.

@MelaniePhilofsky
Copy link
Collaborator

Then why do we need:

"sig_amount - numeric (2)" Isn't this number the quantity?

And "frequency" can be derived from the start date - end date (both mandatory fields) or days supply field and the quantity field. Or days supply can be derived if we have quantity and frequency.

@cgreich
Copy link
Contributor Author

cgreich commented May 13, 2024

No. The quantity is the total quantity handed out to the patient for a period of time (day_supply). The sig_amount is the "3" in a sig "Take 3 times daily".

But you are right. A simplified proposal could be to get rid of the sig, use the source string to parse out the frequency and fill in days_supply and quantity. The problem is that the sig is often not that clean. See above for all the funny situations ("2-4 tablets", "as needed" etc.) Want to burden the ETL schmock to figure that all out?

@tiozab
Copy link

tiozab commented May 13, 2024

I Like the idea that information from sig trumps quantity and duration derived from other information, AS @MelaniePhilofsky did, especially For Dose, sig is the most reliable information. However, the "For how Long" May not always bei available in the sig fields (in which case the duration has to be derived from other fields). In that Way, days supply and duration May not always need to be the Same value, especially if we have the information that the package would have lasted For 14 days (days supply), but only 10 days of use was prescribed (= 10 days of duration).

I also Like to keep the Source sig because often it is nice to Double Check What the Source was.

Moreover, i think it does not hurt to create additional fields and spread out the Source sig information to two more fields (only two if we standardise the information, more if we dont)
Possible standardisations:
"Number" of "something id" PER DAY
The "something" being a Dose Form Or Volume (using the rule of thumb that around 20 drops equal 1 ml).
E.g. "2" "tablets" per Day
E.g. "2" "ml" per Day

@marcel1334
Copy link

  • The "1 tablets per 2 days". The question is do you store both the 1 and the 2, or do you just store 0.5.
    This was my first thought as well. But maybe its important to known if something is "every day 0.1ml" versus "once a month 3ml". Average per day will be the same, but maybe an important difference. Storing as "0.1 / 1" vs "3 / 30" makes it explicit and very easy to do calculations. Even if something is "2 per 6 hours" can be stored as "2 / 0.25".

  • The "3 weeks and stop for one". This is a bigger deal, because it is the norm for chemotherapy regimens. If we want to support that we would need a more complicated model. The question is whether this is data or reference data. I am not sure.
    We don't have chemotherapy prescription in our GP database. In our database its very common for oral contraceptives, I don't think such non-fixed dose instructions can be derived from the drug.

  • Totally. These are connected. Essentially: Days_supply = quantity / frequency per day.
    Agree with Theresa ("For Dose, sig is the most reliable information"): Calculations based on duration and quantity assumes that these two attributes are correct and reliable. Since end date in cdm is mandatory and not always known in the source, our ETL makes some assumptions to calculate a duration and fill the end date. Both duration and quantity have all kinds of issues and uncertainly. In situations where the duration is not that imporant but the dose-per days is important, having a more pure/native dose_per_day can make a difference.

Other examples related to this:

  • tablet 1 per day for 7 days, but patient gets a box of 10 (because this is the number of tablets in a box)
  • especially with solutions and cremes a patient often gets much more. 1ml per day for 10 days but gets a bottle of 30ml. Based on the duration and quantity the daily use would be 3ml per day (factor 3).
  • patient gets small amount of medication in case of asthma exacerbations. duration often gets a default of 30 or 90 days. hopefully the patient only uses it for a couple of days of never. Knowing that its a "if needed" can be relevant.
  • instruction "max 4 tables a day"
  • we also have many prescriptions wihtout with unknown quantity (and duration), but do have the dose instrucations. Currently the ETL has a mechanism to come with a end_date (because required) based on the ATC DDD and or some meta data from z-index (our national drug_vocab). In this case the duration attribute in drug_exposure is empty.

I would also vote to keep the sig as a source. Same for the other source attributes in the other tables. Escpecially with the dose_per_day and do calculations where it does not make any sense, the rough dose text can help to find out whats going on. Even if not you own language, often you recognise. Having the source_text attributes are important. Otherwise we fully run blind on the structured data and cannot validate whats going on. There are also multiple studies where the source fields are used in the queries.

Just to make sure how I stand in this: I'm NOT trying to get a very very complex database model to capture all the different situations I bring up. I just want to put it on the table to get more insight and help coming to a usable and pragmatic solution. If you do a study where the dose instructions are optional or gives the patient freedom you known the calculations will have issues. And for the Asthma exacerbation drugs you also known how it works and have to adjust for this.

I think adding a single "amount_per_day" as floatingpoint attribute where the unit is the same as drug_strength denominator (just like the quantity) will work in lots of situations and already make a lot of people very happy.

@cgreich
Copy link
Contributor Author

cgreich commented May 15, 2024

Should we collect a good sample of sig strings and then come together to make a decision?

@marcel1334
Copy link

I have 700.000+ unique sig texts for you. I will send the top 1000 (including the extracted parameters) to you by email.

@tiozab
Copy link

tiozab commented May 16, 2024

I think it would be nice to have it from more than 1 database? I will send to @cgreich also the 1000 top sigs from CPRD GOLD by email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs More Work
Development

No branches or pull requests

5 participants