Data Extracts: Pharmacokinetic (PK) reconciliation – what happens when your data has no Sample ID?

By Ian Mullan, A&O co-founder, 01 October, 2020.

Data Extracts - Lessons learnt is the first of a series of topics that will cover our insights and experiences of dealing with data, clients and ultimately, solutions!

There has been an increasing shift from the inclusion of REFIDs* in Clinical Report Forms (CRFs). This also includes fields such as ‘Y/N’ as to whether a sample was taken or not. But at what price?

*Also known as Sample IDs, Acquisition Nos, etc

Let's look at the apparent reasons for this:

  • studies increasingly move more towards data that is analysed, with elimination of items considered ‘administrative’;

  • too many data queries result from transcription errors when entering complex, coded, text strings which then, ultimately, do not align with the lab file.

Such decisions are undertaken as cost-saving initiatives in motive, but we also know how well our colleagues bow to pressure from irate investigators and field staff owing to data queries! So, such decisions are often taken with seemingly little, to no, consideration as to downstream impact.

What is the implication of removing Lab ref Ids in the CRF for pharmacokinetics (PK)?

Data managers are typically tasked with PK reconciliation. In its simplest form, the CRF says a sample was taken so then check if it has been included in the lab report. The lab reference number should be present in both the CRF and lab file. Essentially, it is the “linking” variable between two data sources. Once the two data sources are linked via a reference ID, many other checks can be performed (eg, does the subject ID match, do dates match, do sample draw timings (if collected in both datasets) match, is a sample still on-site sitting in the freezer behind the Häagen-Dazs ice-cream, to name a few; I will come back to this, below, under what is ‘true PK reconciliation’).

PK reconciliation can end up being an arduous task – as if it wasn’t bad enough already

The consequence of removing lab IDs without consultation of the wider study team(s) can have a disastrous impact, not only on ultimately having clean data for PK analysis, but in logistical costings such as multiple courier usage due to no one having a clue if samples were taken or not and from whom. A study with a primary outcome centred on PK can be severely hindered for interim analysis if samples simply are not available, for example. PK reconciliation can end up being an arduous task – as if it wasn’t bad enough already - flicking through EDC screens and trying to match entries in separate lab files. Believe me, I have seen this being the end result and I bet you have, too!

Much to my surprise, while I have seen some SAS attempts case-by-case, study-by-study, to unify the two data sources, no one seemed to be utilising a globally available solution that did not require the services of either a SAS programmer or J-Review/Spotfire/Tableau expert. This was reconciliation that had always been a DM task that – at least in my experience – didn’t necessitate the resources of an additional programmer, yet programming support in this area was consequently in greater need. Indeed, reconciliation is often a data manager’s first introduction to Excel basics and VLOOKUP (<of which, as an Excel expert…don’t go there!! I might write a blog on that one day). I was often left wondering why data management departments did not seek home-grown solutions stemming from their own data managers; by that, I mean data managers possessing at least intermediate Excel skills.

So what is the solution? Indeed, is there a solution?

In an ideal world, a data manager can be a data manager, and a programmer can be a programmer. My personal belief and approach over the last 13 or so years now to things like PK reconciliation, is that a data manager can learn some intermediate programming in MS Excel, freeing up much needed time for the ‘official’ programmer. Now we are stuck without ref ID codes, WE NEED TO MAKE OUR OWN, and the data manager with a grasp of simple programming techniques can take such a step!

Raw CRF data extracts into Excel, either via the EDC reporting facility, or via minimal manual effort to combine 1 or 2 extracts, should include the subject ID, Visit ID, and Exposure timepoint (eg, am or pm dose). The extracts, for ‘true PK reconciliation’, need to also include the exposure start (and end times, for infusion studies).

Concatenating (ie, joining) these variables will create a unique code for a single study drug dose. For example, patient 2333 at visit 4 morning dose could be coded as 2333visit4am, by joining cell values together. The corresponding lab file may have patient’s 2333 visit 4 morning dose coded as 2333, V4, Morning, all in separate columns, or may even have a unique code such as ‘V4PreD’; it entirely depends on your lab provider, but I think you can see where I am going with this: we have worked backwards to create codes that enable the two data sources to ‘speak’ to one another once again! While CRF code 2333visit4am still does not match lab code 2333V4Morning, via a mapping table and some basic Excel formula, we have resolved the missing lab ref ID issue.

And now you wonder, maybe it was not a bad idea to get rid of ref IDs because for reconciliation, at least, they were not required. If only a backfill reconciliation solution was set up before data standards came along and ruined your day, huh?

But why stop just there? What about rolling up our sleeves and performing true PK reconciliation?

True PK reconciliation

As an experienced data manager, and dual programmer for the last 10 years - and as is A&O’s philosophy – I always ask: ‘what else can we get out of data?’ Re-establishing the link between the two files is great, especially to help coordinate sample shipment and tracking. But what about data quality? We know we should not wait for PK analysis to be performed and keep our fingers crossed, but this happens far too often. Often with impossible-to-use samples being eliminated from analysis altogether, you have to ask why was huge expense spent in taking samples without adequate oversight? Not to mention the physical interventions patients had to endure? (makes you wonder in a patient-first culture, why no one really thought what the downstream impacts were with removing ref ids!)

It is certainly one area where a data manager gets as close as possible to ethically ensuring a patient’s participation was not in vain.

Patients deserve much better than this: they deserve any procedural intervention contributing to a study’s outcome, potentially for their own good and that of future patients. This boils down to data quality. It is certainly one area where a data manager gets as close as possible to ethically ensuring a patient’s participation was not in vain.

True PK reconciliation requires calculating if samples were taken per protocol. Was the sample labelled 'Predose' really taken before the study drug was administered? Was that 15 minute post dose sample taken, as the protocol requires, 12 to 18 minutes after the study drug was taken? By cross-referencing the files (thanks to being linked up again via codes), you can start to link up timings, you become empowered to raise relevant queries, start to see when samples might have been switched, by cross-referencing the study drug intake time against the sample drawn time.

Of course, this can be done manually, but remember I mentioned ‘mapping table’ earlier? We can go one step further and add, to our already-created mapping table, WHEN samples SHOULD be taken in relation to when the study drug was taken.

We can go a step further and add another data source, such as IWRS, to bring an even more 'live' component (this can greatly assist CRAs in showing whether samples would have been taken for very recent visits). But let us stick to the above model for now.

This results in an extremely powerful, almost automatic, overview of not only when samples might have been mixed up and mis-labelled, but how well or poorly sites are performing, or if the protocol was too strict on what is possible for site staff to perform in a busy clinical setting. You are already on your way to establishing PK protocol violations (PVs), site performance metrics, quick-fire listings for sample shipment coordination, and can play into risk based monitoring and future site selection decisions. You will quickly become new best friends with other departments, instilling a new faith into data management when most of them do not even know why the data management department exists!

But in terms of data quality, a data manager can quite literally, in blinded fashion, aid in ensuring the PK analysis goes from this to this:

Now if you have reached these dizzy heights in creating such a tool, then there is one other factor worth mentioning: because you have used a system-independent tool (Excel), with mapping table to boot, you have also created a tool that will run on ANY data source, and no longer have that SAS programmer setting stuff up study-by-study. Remember: one day some bright spark will bulldoze the department and suggest a switch from Inform to RAVE….no problemo, you have got it covered!

But you will not be without some hurdles and things to consider. Let’s look at what you are up against next….

What about validation?

In an industry requiring compliance, where every breath requires documenting (almost!), validation is an important factor and is going to depend on your company’s own internal policies and systems. Should a courageous colleague tackle automated efforts in PK reconciliation – note I am no longer talking about lab ref ids here (we’re beyond that now) – then here are some pointers:

  1. managers: give some space, peace and quiet to this individual attempting this;

  2. it will become immediately apparent any self-DM-written formula have been mis-applied [far too many errors requiring revision!];

  3. there’s nothing stopping taking a large cross section of output and QC’ing against the 21CFR Part 11 compliant system the data was sourced from and documenting outcomes/corrective actions;

  4. a key point in all of this: no query should EVER be raised without the reviewer going back into the validated, 21CFR Part 11 compliant, systems / data sources to check a data issue still exists (this is a must-do practice no matter who provided a tool…SAS, Tableau, I-Review / J-Review, Spotfire, Spitfire, Timbuktu). A data manager should never just raise a query from just referring to a listing. The listing could be out of date (or even inaccurate!!) or the data may have been updated in the EDC just 5minutes ago!;

  5. to be extra nit-picky here: was the DM doing it manually holding lab print-outs against the EDC screen doing it ‘validated’? How do you demonstrate this on inspection day?;

  6. was the other method (manual or via programmed listings) validated in any EXTRA manner (see point #3 above)?;

  7. the heavily-burdensome, manual, Data Management Plan (DMP) PK reconciliation requirement to evidence the task happened in more ‘conventional’ methods could still be undertaken to cover all bases, but when this moment arrives you’re highly likely to have captured everything;

  8. in summary, no resultant FINAL action is ever taken OUTSIDE of a 21CFR Part 11 compliant system, and your own internal practices almost certainly can be modified or continued in a complimentary, well-documented, manner.

What about security?

In providing this method of PK reconciliation for previous clients in the past, this question has come numerous times.

In this day and age, the work environment is very secure, with VPN networks and computing equipment encrypted. Any such methods in tracking PK reconciliation in DM-self-administered fashion are evidently performed in said environments. These are no less insecure than the use of other listings, data tools, extracts of, screenclippings sent over email, etc.

Indeed, I would argue that by using a key, single, tool that centralises lab PK data, CRF data, and protocol requirements, all-in-one and run by a Subject Matter Expert (SME), only serves to reduce random use of other, uncoordinated, communication methods and tools. I would go further to suggest any sample shipment coordination with CRAs is performed by providing files in agreed secure networking platforms such as SharePoint type sites.

There is always the risk an inattentive employee may send Excel creations beyond the corporate firewall. But it is possible to save Excel tools as password protected, and can also be saved as macro-enabled files that will only run if they detect they are sitting in your company’s VPN environment (a future blog on this, perhaps). Only being operable in your company’s VPN environment adds an extra layer of security should a rogue, parting, employee wish to take such creations with them. So that leaves only the very determined hacker to crack such a creation, but then this is true of any type of tool.

It may be your company policy that, if you go down a macro-enabled route, eTMFs do not permit macro-enabled files to be stored within. The answer to that is simply save extracts as xlsx (non-macro enabled) files.

I sit here performing something akin to a gallic shrug as I type this: there’s not much more I can really add to this.

What about blinding?

That’s always a concern, and is not too dissimilar to points covered under Security.

On Blinded studies, of course study personnel must remain blinded to results of PK analytes. This goes without saying. It also goes without saying that the lab source file, containing the codes we have covered in this article, containing sample draw times, is the SAME ONE as would have been provided and used had a DM not set up an more automated method. Keeping in mind that the lab source file is coming from a 21CFR Part 11 compliant system and all measures concerning the assurance of blinding is taken by the third-party provider (ie, the central lab system); any failure of maintaining blinding has nothing whatsoever to do with a data manager attempting a more automated approach to PK reconciliation.

I would always retort to allegations of risking unblinding like this: by introducing a reproducible tool that progressively is faster to set up as each new study starts, that is central and controlled by a SME, should the 21CFR Part 11 compliant system be found to be spurting out unblinded results, then the issue was discovered MUCH SOONER thanks to attempting to improve PK reconciliation. The sooner the better for such mishaps, evidently.

So a word of advice on this topic of blinding: never use sentences like “this is great, my team now have their eyes opened!!” or “I can finally see what’s going on with PK!!”, because someone, somewhere, will be that that deer/rabbit caught in the headlights.

It’s always about the patients!

So you can see, we have gone on a bit of a journey here. I started out tackling the tricky issue of how to reconcile after that ‘fantastic’ decision to get rid of lab ref IDs, all the way to how I feel – how A&O feels – PK reconciliation must truly be performed. By doing so, you not only serve your study team efficiently, bringing newfound quality, without doubt bringing study costs down in the process, but most importantly of all: the patient gave their time, their consent, their discomfort, in providing valuable samples. So do a service for your patients.

How can we help you?

For more information on PK reconciliation see: or

If your organisation needs assistance with PK reconciliation, please contact A&O. We can provide your department with additional PK tool features including:

  • Ability for any user to self run (feed in EDC, lab file)

  • Automatic review

  • Ability to store and retain user comments = no need to re-review ALL data again

  • Provide 'live' data overview

  • Portfolio-wide PK KPI dashboard

Would you like to learn more? Please get in touch

Search By Tags
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square

Allow us to help you sort your apples from your oranges!


  • Black YouTube Icon
  • Black LinkedIn Icon
  • Black Facebook Icon

© 2017-2020 Meijboom Consulting Ltd trading as Apples & Oranges Data Solutions