Introduction

International guidelines for the management of neonates with unconjugated hyperbilirubinemia include treatment thresholds that are based on total serum bilirubin (TSB) concentrations. Bilirubin measurements are key to the management of neonatal jaundice. An essential prerequisite for bilirubin measurements is that they are accurate and precise to manage jaundiced newborn infants appropriately. Over the past decades, bilirubin measurement for severe neonatal hyperbilirubinemia (SNH) identification has constituted a major challenge. A vast number of methods to determine bilirubin in human serum have been developed since it was first reported in 1858 by Frerichs (Gmelin reaction).1,2 Van den Bergh and Snapper described their important diazo reaction in 19133 and the colorimetric determination by Malloy and Evelyn was published in 1937.4 This was followed by Jendrassik and Grof who refined and modified the diazo reactions in 1938.5 For a thorough understanding in the subsequent paragraphs of the so-called science of measurement and of terms such as TSB accuracy, trueness, metrological traceability, and measurement uncertainty, we refer to the definitions and terminology provided by the International Vocabulary of Metrology.6

Diagnostic methods—invasive total serum bilirubin measurements

Multiparameter instruments

Routine laboratory bilirubin measurement is commonly performed with multichannel instruments that provide values for conjugated (direct), unconjugated (indirect), and TSB concentrations. They usually base the direct determination of bilirubin on diazo (Jendrassik Grof) and vanadate oxidase chemical reactions, or on variants.7 On account of the high costs involved and the requirement of specialized personnel, these instruments are used mainly in laboratories of large hospitals. Even though the concentrations supplied by these instruments were considered as the clinical “reference” for TSB, inconsistencies amongst the different laboratory methods have been observed for decades.8,9 In 2010, in the Netherlands, Cobbaert and colleagues analyzed the accuracy of TSB levels nationwide as measured by the most commonly used multiparameter instruments and in vitro diagnostic devices (IVDs).10 In this study, a pooled human serum was supplemented with unconjugated bilirubin to obtain target values of 26.7 µmol/L (95% confidence interval (CI) range of 26.1–27.3 µmol/L), and 68.7 µmol/L (95% CI range of 67.2–70.2 µmol/L), as assigned with the Doumas reference measurement procedure (RMP) of the Joint Committee for Traceability in Laboratory Medicine (JCTLM) listed reference laboratory in Hannover, Germany. The Doumas RMP can be considered as the gold standard. The two value-assigned specimens were measured in 183 medical laboratories and in-house by IVD manufacturers using their respective routine methods. This procedure allows the accuracy of results produced by Dutch medical laboratories and IVD manufacturers to be assessed and to be compared to the target values assigned with an internationally recognized reference method. The interlaboratory variability and inaccuracy of TSB levels observed among manufacturers and individual laboratories were substantial. This indicates that the concept of metrological traceability, which leads to exchangeable TSB results, was not uniformly adopted in the commercially available IVDs. Also, in-house results produced by individual IVD manufacturers demonstrated significant differences. Similar discrepancies were observed by Greene and colleagues when comparing the performance of a Beckman AU680 instrument versus Ortho-Clinical-Diagnostics’ VITROS device.11

It is common in neonatal intensive care units (NICUs) to find blood gas analyzer instruments like Radiometer ABL models or the RAPIDPoint models produced by Siemens HealthCare Diagnostics Inc.12 Blood gas analyzer instruments estimate TSB indirectly by using whole blood co-oximetry, whereby whole blood is hemolyzed and serum equivalent bilirubin concentration calculated. Previously, the comparability of blood gas analysis-derived TSB levels and TSB levels measured on routine laboratory instruments was assessed. In 2018, Lano and colleagues reported a comparative analysis of neonatal TSB levels by whole blood co-oximetry (Radiometer® ABL90) against plasma bilirubin methods (Roche Diagnostics Cobas® C-601 and Ortho Clinical Diagnostics VITROS® 350).13 Results showed good correlation in comparison to the Roche plasma diazo method, with a mean bias of −1.0 µmol/L across the bilirubin range examined and a 95% confidence interval range of −20.00 to 19.00 μmol/L. However, a statistically significant underestimation was found against the VITROS® 350 machine with a mean bias of –4.4 µmol/L over the bilirubin range examined and a 95% confidence interval of –29.90 to 21.10 μmol/L. Performance of the GEM Premier 4000® blood gas analyzer (Instrumentation Laboratory, Bedford, MA) was also assessed showing a wide range of differences compared to VITROS, with a negative bias at low concentrations of bilirubin and a positive bias at higher concentrations. Moreover, hemoglobin concentration and hemolysis affected the recovery of the GEM blood gas analyzer results.14

Bench-top bilirubinometers

Other commonly used instruments for TSB measurement are bench-top bilirubinometers. Based on direct spectrophotometry, these are simple and rapid alternatives for assessing TSB that require a minimal sample for analysis. In practice, an undiluted serum is used to measure the bilirubin absorbance (at 454 nm) and hemoglobin (at 454 and 528 nm). Hence, subtracting the absorbance at 528 nm from that at 454 nm yields a value that can be attributed largely to bilirubin. The prevalence of the other forms of bilirubin and chromophores in older children and adults limits the use of this technique to neonates younger than 2–3 weeks of age.15 Two types of direct spectrometry instruments are available commercially: those using sample cuvettes such as UNISTAT (Reichert Technologies, USA), and those using hematocrit capillary tubes like One Beam (Ginevri, Italy). Although there are many options available commercially of instruments based on direct spectrophotometry, validation studies of this method are limited. The advantage of bilirubinometers is the short turnaround times for results as shown in Table 1. The requirement of sample processing, however, and the need for additional instrumentation such as centrifuge and trained laboratory personnel limits the use of this method for TSB determination substantially.16,17

Table 1 Characteristics of commonly used instruments to measure bilirubin concentrations in neonates.

Hand-held point-of-care (POC) bilirubin instruments

Neonatal jaundice identification has always posed a challenge, mainly in LMICs.18,19 Over the last years, with the advancement of technology, different solutions have emerged for SNH screening. In 2017, Keahey and colleagues reported validation data of a new screening device under development, the BiliSpec, in 94 blood samples of 67 newborn infants between the day of birth and 24 days.20 This screening method is based on a battery-powered low-cost reader designed to quantify serum bilirubin levels from whole blood applied to a lateral flow card.20 From a maintenance point of view, a drawback is that this device requires daily calibration for routine use. The study showed a high correlation (r = 0.97) of BiliSpec against a bench-top bilirubinometer (UNISTAT; Reichert Technologies) within a TSB range from 19 to 393 µmol/L with a mean (± SD)  value of 181 ± 68 µmol/L and differences up to 51 µmol/L (68% of total samples deviated ≤ 17 µmol/L). The mean bias between bench-top TSB and BiliSpec bilirubin readings was 5 µmol/L, with 95% limits of agreement of −29 to 38 µmol/L. In practice, lateral flow cards are designed to accept drops of whole blood obtained directly from a heel or finger prick. The separation from corpuscular components of the blood allows the flow of plasma into the nitrocellulose (NC) membrane by capillarity. Once the operator has visually interpreted that the NC membrane is saturated, the card is inserted into the reader for bilirubin measurement. The authors report that the design of the card allows controlling the volume of blood applied. The variability, however, in bilirubin test results could be the effect of an undersaturation or oversaturation of the NC membrane. In recent years, TSB measurement by another POC diagnostic method on capillary or venous blood samples became available with the Bilistick® System 1.0 (BM-BS 1.0 - Bilimetrix, Italy).21 This direct method consists of a hand-held, rechargeable battery reflectance reader and test strips composed of a blood plasma separator coupled with an NC membrane, both encased in a plastic cassette. After loading the whole blood sample on the strip (35 µL - hematocrit up to 70%), it requires less than 100 s for serum separation and NC membrane saturation, depending on the hematocrit of the sample (identified automatically by the reader using light reflectance measurements to detect serum flow stabilization). The reader measures reflected light from the plasma-saturated NC membrane, using a light-emitting diode (LED) with an emission peak at 465 nm for quantifying bilirubin. A second LED of 570 nm detects whether hemoglobin contamination is present. The instrument is internally calibrated to optimize sensitivity and provides TSB measuring in a range of 17 to 684 µmol/L. The accuracy of the Bilistick® System 1.0 device was documented by comparing results with TSB measurements from routine laboratories. In 2018, Greco and colleagues reported the performance of the Bilistick® System 1.0 for identifying SNH in a multi-country approach in 1911 newborns. They showed that the TSB level measured by Bilistick® System 1.0 was not significantly different from laboratory TSB values in all four countries.22 The Bilistick® System had a positive predictive value (PPV) of 92.5% and a negative predictive value (NPV) of 92.8%. When Greco and colleagues compared the Bilistick® System 1.0 with both transcutaneous bilirubinometry (JM-103) and laboratory TSB results (Synchron CX PRU 16360, Beckman-Coulter, USA), they found the Bilistick® System 1.0 to be a good alternative to transcutaneous bilirubin determination for early diagnosis and proper management of neonatal jaundice.23 In 2018, Thielemans and colleagues reported a rather high failure rate for the Bilistick® System 1.0, especially in highly humid climatic conditions and at high hematocrit values.24 In 2020, Kamineni and colleagues called to further improve the accuracy of the Bilistick.25 Despite of these observations, the reliability and clinical use of the Bilistick® System 1.0 for measuring TSB was considered to be appropriate in other studies performed under similar weather conditions.22,26,27 POC diagnosis of hyperbilirubinemia has also been claimed by Tabatabaee and colleagues who reported fast and reproducible TSB measurements in whole blood with a recently developed smartphone-based bilirubin assay kit using photoluminescent bacterial cellulose nanopaper.28 One of the main advantages of the portable POC bilirubin instruments is the much shorter turnaround time, that is the interval between collecting the specimen and reporting the TSB result, compared to commonly used multi-analyzer instruments for TSB tests (Table 1).27 Low-cost POC instruments appear to be an effective alternative for the measurement of TSB in newborns, particularly when conventional laboratory methods are not available or inaccessible.

As shown by comparative analysis, the unacceptable high variability in TSB measurement among methods continues to pose a real challenge to result harmonization of clinical routine methods. When Lo and colleagues evaluated the trueness of neonatal TSB using value-assigned, commutable specimens in four major instrument groups (Dimension, Olympus, Synchron, and VITROS), they found a systematic error in TSB measurement associated primarily with the failure of instrument manufacturers to produce reliable bilirubin calibrators.17 High variabilities were also observed when comparing TSB levels in patient specimens on multiparameter analyzers, transcutaneous bilirubin, and direct spectrophotometry instruments.29 Other potential sources of inaccuracy include sample integrity and sample handling. These were, however, typically identified as random errors. An in-depth analysis of results reported by the College of American Pathologists Neonatal Bilirubin PT Survey from 2011 to 2015, showed how changes in TSB test results—when trying to recalibrate instruments—can lead to completely opposite clinical interpretations.30 Standardization of TSB measurement remains a formidable challenge for laboratory medicine.31,32

The relevance of correctly implementing the internationally endorsed reference measurement systems

Inaccuracy and non-equivalence of TSB results among IVD manufacturers are well known.17,30 But why are TSB tests not standardized adequately? It appears that there is insufficient awareness of the metrological traceability concept and its essential implementation through an unbroken chain of calibration hierarchies. Standardization is key to guaranteeing that TSB test results correspond properly to internationally agreed standards of a higher order (Fig. 1).33,34 To achieve global standardization of measurement results in medical laboratories, the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) and the JCTLM promote the concept of metrological traceability of test results to internationally accepted standards. For proper commercial test calibration, medical test results must be anchored to higher-order reference materials and higher-order reference procedures. In the case of total bilirubin, tests were standardized with a Doumas reference method for bilirubin (using diazo-based spectrophotometry) and National Institute of Standards and Technology Standard Reference Material (NIST SRM) 916a reference material according to a strict calibration hierarchy. The former NIST SRM 916a reference material went missing years ago (it ran out of stock) and consisted of three isomers, two of which were not present in native human sera. It was decided therefore to use the specific molar extinction coefficient (ε) of 7649 m2/mol for bilirubin quantitation as it is superior to control the accuracy of a standard solution. Consequently, an updated and extended reference measurement system (RMS) was established and has been in place since 2018. To that end, Klauke and colleagues re-evaluated the Doumas candidate Reference Method and established a next-generation RMS without any need for a calibrator or SRM.35 Currently, in this RMS total bilirubin is described as a so-called operative measurand, defined by a set of measurement parameters.

Fig. 1: Reference measurement system and metrological traceability chain for TSB.
figure 1

This measurement system was adapted from ISO 17511:2020 and based on the re-evaluated reference measurement procedure from Klauke and colleagues.35 Total serum bilirubin is described as an operative measurand, as defined by the reference measurement procedure. Through this reference measurement system, with its unbroken chain of calibrator materials and methods, bilirubin test results are anchored and made traceable to the International System of Units. Measurement uncertainty should remain within allowable limits to make the test fit for the purpose. SI International System of Units; ε the molar absorption coefficient of bilirubin (conventional quantity value of 7649 m2/mol); Mfr, IVD manufacturer who supplies CE-IVD kits for routine analyses to the European market.

The German Society for Clinical Chemistry and Laboratory Medicine runs the Reference Laboratories International Federation of Clinical Chemistry (RELA IFCC) external quality assessment scheme. It periodically checks how to reference laboratories perform using their RMP, i.e., the gold standard for total bilirubin (http://www.dgkl-rfb.de:81).

Notwithstanding the existence of approved reference measurement systems (the former Doumas and AACC RMS (1985) versus the new RMS (2018)),35 the levels of standardization and trueness of routine total bilirubin test results consistently differ amongst manufacturers—as demonstrated in external quality assessment (EQA) programs that use commutable EQA samples—and has often been questioned. Yet, TSB tests can only be declared fit for clinical purposes if results are accurate within permissible limits of measurement uncertainty. The latter is often translated into desirable analytic performance specifications derived from biological variation data. In the case of TSB, rational and desirable analytic performances in adults are a CVa (analytical variation) of <12.8%, an allowable bias of less than 10%, and a total allowable error of 31.1% (https://biologicalvariation.eu/).36 For neonatal bilirubin no biological variation data are available. Consequently, pediatricians can determine the analytic performance criteria needed to make the neonatal bilirubin test fit for purpose themselves.

Relevance of accuracy-based external quality assessment for trueness verification

External quality assessment (EQA) plays an essential role in helping to assure the quality of laboratory medicine on a daily basis. EQA schemes may reveal significant and systematic between-method variability for measurements of the same analyte in the same specimen. Detection of between-method variability through EQA is also a major driver for further improvement of test standardization.

Medical laboratories are obliged to perform an EQA and when used effectively it can provide many opportunities for improving test accuracy. In the Netherlands, an accuracy-based EQA was developed by the Dutch Foundation for Quality Assessment in Medical Laboratories (Stichting Kwaliteitsbewaking Medische Laboratoria, SKML) for general chemistry analytes. Thus far, however, these did not include TSB. To develop an accuracy-based EQA, native, commutable, value-assigned EQA samples are essential to give medical laboratories insight into trueness and imprecision of their bilirubin tests as well as other chemistry tests. In this way, inaccuracy, and absolute bias as a result of, for example, lot-to-lot variation or method changes, can be monitored in a longitudinal and sustainable way by every participating laboratory. To date, no proven commutable EQA samples have been developed for TSB. This has not been accomplished yet because the EQA samples used until now are spiked with either unconjugated and/or conjugated bilirubin. It was demonstrated that the TSB recoveries are affected by the spiking material, hampering trueness verification. Nevertheless, interlaboratory and intermethod variations are monitored bi-weekly for TSB in approximately 185 Dutch laboratories. Figure 2 demonstrates the interlaboratory and intermethod variations in Dutch medical laboratories using common reagents. In the recent EQA surveys, SKML 2019.4 and SKML 2020.1, overall interlaboratory CVs ranged from 3% to 6% for TSB in the concentration range of 13 to 110 µmol/L. The interlaboratory spreading has improved compared to the situation in 2009 (data not shown).

Fig. 2: Total serum bilirubin measured in EQA samples by ~185 medical laboratories, common IVD manufacturers, and a JCTLM-endorsed reference laboratory.
figure 2

Pooled human serum was supplemented with bilirubin. Serum samples were dispensed, frozen below −70 °C, and shipped on dry ice to the laboratories participating in the regular general clinical chemistry external quality assessment program of the Dutch Foundation for Quality Assessment in Medical Laboratories (SKML). In 2009 (a) and 2020 (c), samples were supplemented with unconjugated bilirubin alone (>98%, mixed isomers, Sigma-Aldrich). In 2019 (b), both unconjugated bilirubin and conjugated bilirubin were added. Each participating laboratory-measured total bilirubin. Results are plotted with the high-concentration sample on the x axis and the low-concentration sample on the y axis (17.1 µmol/L = 1 mg/dL). Colored squares represent mean bilirubin concentrations ± 1 standard deviation in both samples, as measured by the routine medical laboratories in the quality assessment program and using the methods by the respective manufacturers as indicated. The numbers inside the colored squares indicate the numbers of participants for each category. In panel a, stars represent bilirubin concentrations measured in-house by the corresponding manufacturers. The red dots are the targets set in the JCTLM-listed reference laboratory of Prof. Gerhard Schumann and Dr. Denis Grote-Koska in Hannover, Germany (Institut für Klinische Chemie - Zentrallabor, Medizinische Hochschule Hannover). Panel a was adapted from Cobbaert and colleagues.10 Data for panels b and c were supplied by the Dutch Foundation for Quality Assessment in Medical Laboratories and used with their permission.

Other methods to study bilirubin metabolism—determination of bilirubin in tissues and cells

It is well known that the concentration of unconjugated free bilirubin correlates better with bilirubin neurotoxicity in comparison to total serum bilirubin concentration. Because bilirubin behaves like a real signaling molecule,37 its blood concentrations can serve only as a rough surrogate marker of bilirubin metabolism within the cells. Although serum or plasma bilirubin concentrations are in dynamic equilibrium with other biological compartments, its tissue and cell concentrations differ substantially even within one organ as proved, for example, in the brain.38 Knowledge of bilirubin kinetics and dynamics within individual human body compartments is essential to understand its role in the pathophysiology of various clinical conditions. Besides, bilirubin undergoes extensive metabolization, in particular, due to oxidation and photooxidation processes, forming tetra-, tri-, di- as well as monopyrrolic oxidation derivatives, which are likely to exert biological activities and may serve as important biomarkers of pathological conditions. Because cellular bilirubin concentrations are within the submicromolar range, the standard analytic, mostly diazo reaction-based methods, which are used in routine clinical chemistry, have insufficient sensitivity, and do not enable quantitation of bilirubin in the cells, tissues, and organs. This drawback of standard clinical chemistry methods is eliminated by using high-performance liquid chromatography (HPLC) techniques. These techniques enable accurate separation and quantitation of individual bilirubin fractions, such as delta bilirubin, unconjugated bilirubin, bilirubin monoglucuronosides, and diglucuronosides, and can be used under specific circumstances even in clinical settings.39 Simultaneously, HPLC methods overcome the overestimation of bilirubin concentrations caused by the presence of unidentified diazo-positive compounds distinct from bilirubin.40 High-performance liquid chromatography methods are also capable of differentiating various bilirubin isomers present under certain conditions in the biological systems.41 Importantly, determination of the bilirubin subfractional changes, including detection of delta bilirubin, may help in the prediction of the risk of human diseases, such as cholestasis or gallstone disease, or in the differential diagnosis of such diseases.42,43 Interestingly, the first method for separation and quantitative estimation of serum and biliary bilirubin fractions from serum and of three bilirubin fractions from bile was published as early as 1966.44 Since that time, various chromatographic approaches were explored and a number of methods developed, including separation of native as well as derivatized bilirubins, such as ethyl anthranilate azo pigments or bilirubin methyl esters (for a review see ref. 44). These methods were gradually improved. The separation was enhanced by stepping from isocratic, normal-phase HPLC, to various gradient, reverse-phase systems.45 The use of internal standards, such as xanthobilirubic acid40 or mesobilirubin,46 led to improvements in accuracy. Further enhancement was reached with the employment of mass spectrometric detection.47 With these analytic advances, methods for the detection of bilirubin and its metabolites in tissues and cells were established. In contrast to the early insufficient attempts to determine bilirubin in brain tissue, which were based on spectrophotometry,48 HPLC-based methods demonstrated much higher sensitivity and accuracy. Using a newly developed HPLC method based on C8-column separation with the implementation of the methanol/water/tetrabutyl ammonium hydroxide mobile phase and equipped with the diode array detector, it was possible to detect as little as 10 pmol of bilirubin per gram of tissue.46 This method was used in experimental studies quantifying bilirubin in numerous organs, including the heart,49 and especially brain tissues,38,50,51,52,53,54,55,56 which are essential to understand the pathophysiology of bilirubin neurotoxicity.

Determination of bilirubin photoisomers and oxidation products

With increasing knowledge on the biological importance of bilirubin derivatives formed during oxidation processes,57 sensitive and accurate analytic methods are becoming essential. These derivatives include tetra-, tri-, di-, and monopyrrolic bilirubin oxidation derivatives. Probably the most clinically important are bilirubin photoisomers formed during PT of severe unconjugated hyperbilirubinemia. However, the determination of these bilirubin derivatives in biological material is not trivial, because of the lack of commercial standards as well as instability of the pigments. In terms of the determination of bilirubin photoisomers, several methods were published.58 Previously, an improved analytic HPLC method for the simultaneous determination of major bilirubin photoisomers, lumirubin, Z,E- and E,Z-bilirubins, and bilirubin was described using lumirubin as well as internal standards with tandem mass detection.57 The method was validated on serum samples of jaundiced neonates treated with PT. It has the potential of facilitating our understanding of the kinetics and biology of bilirubin photoproducts, which to date are practically unknown. Research into other bilirubin oxidation products is also progressing. Tripyrrolic biopyrrins, which are clinically relevant markers of increased oxidative stress, can be analyzed by immunochemical methods using specific anti-bilirubin monoclonal antibodies.59 Reliable analytic methods for dipyrrolic propentdyopents and monopyrrolic bilirubin oxidation products, Z-BOX A and B, with potential clinical implications were also published recently.60,61 Finally, tetrapyrrolic compounds and their oxidation products were also studied in plants recently using these novel analytic methods and it will certainly improve our understanding of the biological relevance of these pigments.62,63

Summary and conclusion

Invasive TSB measurements remain the gold standard on which the definitive diagnosis of SNH is based. According to international guidelines for neonatal jaundice management, the clinical decision for treatment of neonatal hyperbilirubinemia should be based on bilirubin levels measured in blood by diagnostic instrumentation. Any non-invasive bilirubin estimation must be confirmed by an invasive diagnostic method. Bench-top bilirubinometers and hand-held POC instruments have a few advantages over multiparameter instruments of being cheaper and faster. Test results are available immediately compared to results of TSB measurements using multiparameter instruments in a central laboratory. In addition, less blood is needed as the bench-top biliribinometers and hand-held POC instruments require minimal sample volumes. Novel POC bilirubin measurement methods, such as the BiliSpec and the Bilistick® System are of interest for many newborn infants, especially in LMICs, where the access to costly multiparameter instruments is limited. The main disadvantages of hand-held POC instruments and bench-top biliribinometers are that agreement with routine laboratory TSB varies and that they are still not included in EQA programs. TSB test results on these instruments should be accurate within permissible limits of measurement uncertainty to be fit for clinical purposes. The key to accomplish this is anchoring TSB test results to the latest internationally endorsed RMS for bilirubin. In addition, participation in EQA programs for TSB in the neonatal range, and close interdisciplinary cooperation between physicians and clinical chemists are needed to assure the desired analytic and clinical performance of TSB testing. It is surprising that after bilirubin was first measured two centuries ago, uncertainty still exists on how to correctly assess the concentration of this yellow pigment. Universal implementation of endorsed calibration hierarchies for test standardization remains a daunting task. Recently, analytic methods for bilirubin measurement in biological matrices, such as HPLC thermal lens spectrometry, spectrophotometric, molecular imprinting, and piezoelectric techniques were developed. These methods employ novel techniques that could further accelerate bilirubin research to improve the management of newborn infants with SNH.