The Accuracy of Hospital ICD-9-CM Codes for Determining Sickle Cell Disease Genotype

Angela B. Snyder, Peter A. Lane, Mei Zhou, Susan T. Paulukonis, Mary M. Hulihan

Georgia State University, Department of Public Management and Policy, Atlanta, GA and Georgia State University, Georgia Health Policy Center, Atlanta, GA
Aflac Cancer and Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, GA and Department of Pediatrics, Emory University School of Medicine, Atlanta, GA
Georgia State University, Georgia Health Policy Center, Atlanta, GA
Public Health Institute, Richmond, CA, Atlanta, GA
Centers for Disease Control and Prevention, Division of Blood Disorders, Atlanta, GA

Sickle cell disease affects more than 100,000 individuals in the United States, among whom disease severity varies considerably. One factor that influences disease severity is the sickle cell disease genotype. For this reason, clinical prevention and treatment guidelines tend to differentiate between genotypes. However, previous research suggests caution when using a claims-based determination of sickle cell disease genotype in healthcare quality studies.

The objective of this study was to describe the extent of miscoding for the major sickle cell disease genotypes in hospital discharge data. Individuals with sickle cell disease were identified through newborn screening results or hemoglobinopathy specialty care centers, along with their sickle cell disease genotypes. These genotypes were compared to the diagnosis codes listed in hospital discharge data to assess the accuracy of the hospital codes in determining sickle cell disease genotype. Eighty-three percent (sickle cell anemia), 23% (Hemoglobin SC), and 31% (Hemoglobin Sβ+ thalassemia) of hospitalizations contained a diagnosis code that correctly reflected the individual’s true sickle cell disease genotype. The accuracy of the sickle cell disease genotype coding was indeterminate in 11% (sickle cell anemia), 12% (Hemoglobin SC), and 7% (Hemoglobin Sβ+ thalassemia) and incorrect in 3% (sickle cell anemia), 61% (Hemoglobin SC), and 52% (Hemoglobin Sβ+ thalassemia) of the hospitalizations. The use of ICD-9-CM codes from hospital discharge data for determining specific sickle cell disease genotypes is problematic. Research based solely on these or other types of administrative data could lead to incorrect understanding of the disease.

Background: Sickle cell disease (SCD) is characterized by chronic hemolytic anemia and a wide variety of acute and chronic complications caused by intermittent episodes of vaso-occlusion, vascular injury, and organ damage. SCD affects more than 100,000 individuals in the United States1 and with recent advances in care, individuals with SCD are living longer2. Even so, disease severity varies considerably among individuals. While some exhibit severe complications and die before reaching middle age, others are far less symptomatic3.

One factor that influences disease severity is the SCD genotype. In the United States, most individuals with SCD have homozygous hemoglobin (Hb) SS disease (Hb SS), and the remainder have a compound heterozygous form of SCD caused by co-inheritance of Hb S with a different beta globin mutation such as Hb C or a variety of β thalassemia mutations4. The Hb Sβ thalassemias are divided into two groups depending on the severity of the β thalassemia mutation; mutations that result in absent production of β globin are termed Hb Sβ0 thalassemia while those that result in reduced, but not absent, production of β globin are termed Hb Sβ+ thalassemia. Individuals with Hb SS or Hb Sβ0 thalassemia are classified as having sickle cell anemia (SCA) because their hematological phenotype is characterized by a more severe chronic hemolytic anemia than those with Hb SC or Hb Sβ+ thalassemia, who have less hemolysis and less severe or no anemia. In the United States, Hb SC is the most prevalent of the compound heterozygous SCD genotypes, followed by Hb Sβ+ thalassemia and then Hb Sβ0 thalassemia. A recent population-based surveillance study from six states found that 61.4% of individuals with SCD had SCA, 28.3% had Hb SC, 8.5% had Hb Sβ+ thalassemia, and 1.8% had another compound heterozygous forms of SCD5.

It is sometimes difficult to distinguish between the two SCA genotypes because the co-inheritance of α thalassemia in persons with Hb SS results in significant overlap of complete blood count and hemoglobin electrophoresis results in persons with Hb Sβ0 thalassemia. Therefore, the genotypes cannot be reliably differentiated without DNA analysis. However, the distinction between SCA and Hb SC/Hb Sβ+ thalassemia is important because the risk for some of the more severe complications of SCD, such as stroke, is much higher in SCA than in Hb SC/Hb Sβ+ thalassemia, especially during childhood6. For this reason, clinical prevention and treatment guidelines tend to differentiate between genotypes in their practice recommendations based on evidence of clinical effectiveness and differing risk- benefit trade-offs. For example, annual transcranial Doppler screening for prevention of stroke in childhood is recommended for children with SCA but not for those with Hb SC or Hb Sβ+ thalassemia. Similarly, indications for use of hydroxyurea are much broader and more inclusive for SCA compared to Hb SC or Hb Sβ+ thalassemia7.

International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes have been commonly used by public health officials and health services researchers to identify individuals with SCD to evaluate complications, adherence to treatment guidelines, and the quality of care received. Because researchers have questioned the accuracy of SCD genotypes based on ICD-9-CM codes8,9, studies usually identify patients based on any SCD code (282.41-42, 282.6, 282.60-64, 282.68-69) in any position, primary, secondary or other10-13. A few of these studies have reported results by genotype, including reports detailing rehospitalizations14, influenza15, stroke16, and surgical outcomes for individuals with SCD17. Furthermore, some studies have been restricted to individuals with Hb SS codes only18-20 and others were limited to only individuals with ICD-9-CM codes for SCD crisis21-24. In only one instance has the accuracy of coding by SCD genotype been reported25. In that report, findings from an analysis of emergency department visits at a children’s hospital found that ICD-9-CM codes for Hb SS were 87% accurate, but the Hb SS codes were very often used for ED visits of patients with other genotypes. Therefore, the authors suggested caution when using a claims-based determination of SCD genotype in healthcare quality studies.

Georgia and California have a unique set of SCD surveillance data collected through the Registry and Surveillance System for Hemoglobinopathies (RuSH)5. These population-based data can be used to test the accuracy of hospital discharge data for identifying SCD genotypes and, in turn, their utility in evaluating adherence to practice recommendations that differ by genotype. This is possible by comparing ICD-9-CM codes from the hospital discharge data to laboratory-based information collected from clinical sites and state newborn screening programs within the RuSH surveillance datasets. The purpose of this study was to use the hospital discharge records of a population-based sample to (1) describe the extent of miscoding for the major SCD genotypes and estimate the accuracy of ICD-9-CM codes and (2) report the positive predictive value of using Hb SS ICD-9-CM codes only (282.61 and 282.62) to identify individuals with SCA.

Methods: The methods that California and Georgia used to collect and link population-based SCD surveillance data for 2004 to 2008 from a variety of sources were previously described5. The analyses presented here include individuals with SCD who were reported by newborn screening and/or one of the participating hemoglobinopathy specialty treatment centers (Georgia: Georgia Comprehensive Sickle Cell Center at Grady Memorial, Georgia Health Sciences University, and all three campuses of Children’s Healthcare of Atlanta; California: University of California (UC) San Francisco Benioff Children’s Hospital Oakland and San Francisco, Children’s Hospital Los Angeles, UC Irvine Medical Center, Rady UC San Diego, UC Davis Medical Center) with a laboratory-confirmed diagnosis based on results of clinical laboratory evaluation that included quantitative hemoglobin identification by hemoglobin electrophoresis, high-performance liquid chromatography or DNA analysis. These individuals were then linked to inpatient files from each state’s hospital discharge data (all records from 2004 to 2008) using multiple patient identifiers. This study only included individuals who had at least one hospitalization reported during the five year period.

Portopulmonary hypertension is currently defined as the presence of portal hypertension plus the abovementioned criteria, obtained by right heart catheterization (RHC). Modified from Transplantation 2016;100:1440–52

The hospital discharge data from Georgia included a maximum of ten diagnosis codes per admission; in the California data there were up to 25 diagnosis codes per admission. All of the diagnosis codes were scanned for the presence of an SCD ICD-9-CM code. These SCD codes were then compared to the known SCD genotype for the individual, and a determination of “correct,” “indeterminate,” “incorrect,” or “no code” was made, based on whether or not the diagnosis code and the known SCD genotype were considered to be a correct match (Figure 1). These categorizations were defined from the perspective of the known SCD genotype. That is, for a hospitalization of an individual with SCA, the “sickle-cell thalassemia with/without crisis” (282.41, 282.42) ICD-9-CM codes would be “correct” if the individual had Hb Sβ0 thalassemia, but incorrect if the individual had Hb SS, so the comparison was termed “indeterminate.” On the other hand, a hospitalization of an individual with Hb Sβ+ thalassemia would be termed “correct,” if the same codes were listed.


Figure 1: Comparison of known sickle cell disease (SCD) genotype and SCD ICD-9-CM hospital discharge code, California and Georgia, 2004-2008

Correct = dots; indeterminate = horizontal lines; incorrect = crosshatch

Hospital admissions for individuals with SCD genotypes other than Hb SS/Hb Sβ0 thalassemia, Hb SC, or Hb Sβ+ thalassemia are provided in Table 1, but were excluded from further analyses (n=78 California; n=68 Georgia). Admissions that included more than one SCD related ICD-9-CM diagnosis code were also removed (n=18 California, n=119 Georgia). The data analysis for this study was generated using SAS® version 9.3 (SAS Institute, 2010).

Results: The RuSH surveillance project identified a total of 6,264 individuals with a known SCD genotype in California and Georgia; 1,976 in California and 4,288 in Georgia. Of these, 3,961 (63.2%) accounted for a total of 27,439 hospitalizations during 2004 through 2008 (Table 1); 43.2% were for pediatric patients aged 18 years or younger at the time of admission, and 46.8% were male patients. Individuals with SCA accounted for 72.5% of the patients with a known genotypes and 79.9% of the hospital discharges; Hb SC accounted for 21.4% of patients and 15.3% of discharges; Hb Sβ+ thalassemia accounted for 5.5% of patients and 4.2% of discharges.


Table 1: Description of patients with sickle cell disease and their hospital admissions, California and Georgia, 2004-2008

ICD-9-CM coding correctly identified SCA genotypes in 82.9% of hospitalizations (78.9% in CA, 84.5% in GA), while the coding was incorrect in 3.4% (4.5% in CA, 2.9% in GA) and indeterminate in 11.4% (15.4% in CA, 9.7% in GA). However, coding for Hb SC was correct in only 22.8% (24.4% in CA, 22.1% in GA) of hospitalizations, incorrect in 60.9% (64.5% in CA, 59.3% in GA) and indeterminate in 11.5% (8.0% CA, 13.1% GA). Individuals with Hb Sβ+ thalassemia were correctly coded in only 30.5% of hospitalizations overall; substantially higher in CA (56.2%) than in GA (22.4%). Coding for Hb Sβ+ thalassemia was incorrect in 52.2% (26.3% in CA, 60.3% in GA), and indeterminate in 7.0% (6.6% in CA, 7.2% in GA) (Figure 2). Two percent of hospitalizations for individuals with SCD in California and 4% in Georgia did not contain an SCD ICD-9-CM diagnosis code. The lack of any SCD diagnosis code was higher for Hb Sβ+ thalassemia (10.3%) compared with Hb SC (4.7%) and SCA (2.3%).


Figure 2a. Accuracy of ICD-9-CM hospital discharge codes for determining sickle cell disease genotype, California, 2004-2008.


Figure 2b. Accuracy of ICD-9-CM hospital discharge codes for determining sickle cell disease genotype, Georgia, 2004-2008.

In our cohort of hospitalizations for individuals with SCD from both California and Georgia, using Hb SS ICD-9-CM codes only (282.61, 282.62) versus any SCD code to identify a pediatric patient with SCA improved the positive predictive value (PPV) from 84% (9,850/11,740) to 93% (7,750/8,306); however, in the adult population the PPV only slightly improved from 79% (12,229/15,435) to 81% (10,352/12,752) (Table 2).


Table 2: Positive predictive value of using Hemoglobin SS ICD-9-CM hospital discharge codes (282.61 or 282.62) to identify individuals with sickle cell anemia, California and Georgia, 2004-2008

Discussion: One of the most important distinctions to make, when treating patients with SCD or evaluating their health and healthcare utilization patterns, is between patients with SCA genotypes and patients with other forms of SCD, because patients with SCA are at a higher risk for many disease-related complications and hospitalization14,26. However, the use of ICD-9-CM codes for determining specific SCD genotypes is problematic for multiple reasons. First, although it is sometimes difficult to distinguish Hb SS from Hb Sβ0 thalassemia based on results of hemoglobin electrophoresis or other non DNA tests, there is no single ICD-9-CM code for SCA, rather there are separate codes for “hemoglobin-SS disease with/without crisis” (282.61, 282.62) and “sickle-cell thalassemia with/without crisis” (282.41, 282.42). Furthermore, the latter ICD-9-CM codes do not differentiate between Hb Sβ° thalassemia and Hb Sβ+ thalassemia, the former genotype causing SCA, while the latter does not. The difficulty in distinguishing between SCA genotypes, even with the presence of laboratory-based results, is the reason that the SCD surveillance data collected by Georgia and California, and used for the analyses presented here, utilized an SCA classification, rather than separate Hb SS and Hb Sβ0 thalassemia genotypes. Moreover, the limitations of the ICD-9-CM codes for determining SCD genotypes were the reason that the SCD surveillance data collected on individuals without a newborn screening report or diagnosis from a hemoglobinopathy specialty treatment center did not contain genotype information and were, therefore, excluded from this analysis.

In addition to the inherent limitations of the SCD ICD-9-CM codes themselves, our study indicates that there is also misuse of these codes, as has been previously reported25. That study from a single pediatric emergency department noted that the accuracy of correctly identifying individuals with Hb SS was 87%, with sensitivity, specificity, and PPV of 87%, 79%, and 87% respectively. Our study showed the accuracy of identifying individuals with SCA in hospital discharge data was similar. However the accuracy of ICD-9-CM codes to identify other SCD genotypes was poor, especially for individuals with Hb SC; over 75% of their hospitalizations contained an ICD-9-CM code that did not match their true SCD genotype. For Hb Sβ+ thalassemia, accuracy was also low (69%). This miscoding may be, in part, because some healthcare providers and coders do not appreciate the differences among SCD genotypes, leading to the use of Hb SS codes for non-Hb SS genotypes. While these analyses were based on ICD-9-CM codes, it is important to recognize that the ICD-10-CM codes for SCD that are now being used have the similar limitations.

The lack of SCA ICD-9-CM codes, ambiguous codes for Hb Sβ0 thalassemia vs Hb Sβ+ thalassemia, and the high rates of mismatch between true SCD genotype and the SCD ICD-9-CM codes in hospital discharge data may lead to the erroneous interpretation of such data and mistaken conclusions. Using hospital discharge data to analyze hospital admission and/or readmission rates, the rate of a particular surgical procedure, or adherence to recommended therapy by genotype could incorrectly classify up to 15% of individuals with SCA, three-fourths of those with Hb SC, and over two-thirds of those with Hb Sβ+ thalassemia. Errors such as these could lead to incorrect understanding of the disease and adherence to recommended management and treatment. Our results suggest that caution should be exercised when interpreting research that relies solely on hospital discharge data to identify and determine the genotype of individuals with SCD.

The results presented here further suggest that using only Hb SS ICD-9-CM codes to identify individuals with SCA improves the genotypic accuracy of the resulting sample for pediatric patients when compared to using all SCD ICD-9-CM codes, but does little to improve identification of adults with SCA. These results are due, in part, to the high rates of miscoding for non-SCA genotypes, as well as the higher rates of hospitalization in individuals with SCA (80% of the SCD hospitalizations in this analysis, but only 73% of the patients with a hospitalization and 64% of all identified individuals with SCD, regardless of hospitalization).

A fundamental strength of this study is the identification of cohort members through population-based surveillance, rather than identification solely in administrative data or a SCD specialty clinic(s). While the use of administrative data alone requires fewer resources, it is often incomplete and may suffer from suboptimal quality. Data from SCD specialty clinics have higher quality and more complete reporting, but contain a limited number of cases and may not represent a population-based sample of all patients.

This analysis also has limitations. It has a narrow focus on hospital discharge data, thus excluding the 37.7% of patients with known SCD genotypes who were not hospitalized during the five year period. Results from other commonly used administrative datasets, such as Medicaid or emergency department claims data, may show levels of congruence between true SCD genotype and ICD-9-CM genotype that vary from the results presented here. Furthermore, this study did not include individuals with SCD for whom a laboratory-confirmed genotype was not available. That is, the individuals who were neither born in California or Georgia during the years of active newborn screening nor receiving care from one of the participating clinics. Finally, there were a large number of hospitalizations during this five-year study period that contained an SCD ICD-9-CM code, but were not linked to our cohort.

These findings underscore the need for surveillance systems that can accurately identify individuals living with SCD along with their correct genotype, track their health care utilization, and monitor their quality of care by measuring adherence to practice guidelines in ways that have the potential to reduce SCD associated morbidity and mortality and improve care. Until a national registry or other data collection system is available for SCD in the US, researchers will be limited by the accuracy of ICD-9-CM or, since late 2015, ICD-10-CM codes for determining SCD genotype. Therefore, caution must be used when interpreting results and applying findings to the development of practice guidelines or recommendations for clinical care.

Note: This project was reviewed by CDC and was determined to be a non-research, public health practice activity. Both the California Committee for Protection of Human Subjects and the Georgia Public Health Department Institutional Review Board declared the project exempt from review as a public health surveillance effort; specialty hemoglobinopathy treatment center institutional review boards similarly exempted the project from review. State data requests were reviewed by the appropriate agency, assuring data privacy safeguards were in place. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

This work was supported by the Centers for Disease Control and Prevention Foundation.

The authors report no conflicts of interest.

  1. Hassell KL. Population estimates of sickle cell disease in the U.S. Am J Prev Med. 2010; 38(4 Suppl): S512-21.
  2. Paulukonis ST, Eckman JR, Snyder AB, et al. Defining sickle cell disease mortality using a population-based surveillance system 2004 through 2008. Public Health Rep. 2016; 131(2): 367-75.
  3. Platt OS, Thorington BD, Brambilla DJ, et al. Pain in sickle cell disease. Rates and risk factors. N Engl J Med. 1991; 325(1): 11-6.
  4. Therrell BL, Hannon WH. National evaluation of US newborn screening system components. Ment Retard Dev Disabil Res Rev. 2006; 12(4): 236-45.
  5. Hulihan MM, Feuchtbaum L, Jordan L, et al. State-based surveillance for selected hemoglobinopathies. Genet Med. 2015; 17(2): 125-30.
  6. Serjeant GR. The natural history of sickle cell disease. Cold Spring Harb Perspect Med. 2013; 3(10): a011783.
  7. National Institutes of Health. Evidence-based management of sickle cell disease. 2014. [accessed on May 12, 2017]. Available at
  8. Amendah, DD, Mvundura M, Kavanagh PL, et al. Sickle cell disease-related pediatric medical expenditures in the US. Am J Prev Med. 2010; 38(4 Suppl): S550-6.
  9. Grosse SD, Boulet SL, Amendah DD, et al. Administrative data sets and health services research on hemoglobinopathies a review of the literature. Am J Prev Med. 2010; 38(4 Suppl): S557-67.
  10. McCavit TL, Lin H, Zhang S, et al. “Hospital volume hospital teaching status patient socioeconomic status and outcomes in patients hospitalized with sickle cell disease. Am J Hematol. 2011; 86(4): 377-80.
  11. McCavit TL, Xuan L, Zhang S, et al. Hospitalization for invasive pneumococcal disease in a national sample of children with sickle cell disease before and after PCV7 licensure. Pediatr Blood Cancer. 2012; 58(6): 945-9.
  12. McCavit TL, Xuan L, Zhang S, et al. National trends in incidence rates of hospitalization for stroke in children with sickle cell disease. Pediatr Blood Cancer. 2013; 60(5): 823-7.
  13. Okam MM, Shaykevich S, Ebert BL, et al. National trends in hospitalizations for sickle cell disease in the United States following the FDA approval of hydroxyurea, 1998-2008. Med Care. 2014; 52(7): 612-8.
  14. Brousseau DC, Owens PL, Mosso AL, et al. Acute care utilization and rehospitalizations for sickle cell disease. JAMA. 2010; 303(13): 1288-94.
  15. Bundy DG, Strouse JJ, Casella JF, et al. Burden of influenza-related hospitalizations among children with sickle cell disease. Pediatrics. 2010; 125(2): 234-43.
  16. Strouse JJ, Jordan LC, Lanzkron S, et al. The excess burden of stroke in hospitalized adults with sickle cell disease. Am J Hematol. 2009; 84(9): 548-52.
  17. Hyder O, Yaster M, Bateman BT, et al. Surgical procedures and outcomes among children with sickle cell disease. Anesth Analg. 2013; 117(5): 1192-6.
  18. Chakravarty EF, Khanna D, Chung L. Pregnancy outcomes in systemic sclerosis, primary pulmonary hypertension and sickle cell disease. Obstet Gynecol. 2008; 111(4): 927-34.
  19. Shankar SM, Arbogast PG, Mitchel E, et al. Medical care utilization and mortality in sickle cell disease a population-based study. Am J Hematol. 2005; 80(4): 262-70.
  20. Raphael JL, Mueller BU, Kowalkowski MA, et al. Shorter hospitalization trends among children with sickle cell disease. Pediatr Blood Cancer. 2012; 59(4): 679-84.
  21. Ellison AM, Bauchner H. Socioeconomic status and length of hospital stay in children with vaso-occlusive crises of sickle cell disease. J Natl Med Assoc. 2007; 99(3): 192-6.
  22. Dinan MA, Chou CH, Hammill BG, et al. Outcomes of inpatients with and without sickle cell disease after high-volume surgical procedures. Am J Hematol. 2009; 84(11): 703-9. Erratum in: Am J Hematol. 2011; 86(10): 906-8.
  23. Panepinto JA, Brousseau DC, Hillery CA, et al. Variation in hospitalizations and hospital length of stay in children with vaso-occlusive crises in sickle cell disease. Pediatr Blood Cancer. 2005; 44(2): 182-6.
  24. Raphael JL, Mei M, Mueller BU, et al. High resource hospitalizations among children with vaso-occlusive crises in sickle cell disease. Pediatr Blood Cancer. 2012; 58(4): 584-90.
  25. Eisenbrown K, Nimmer M, Brousseau DC. The accuracy of using ICD-9-CM codes to determine genotype and fever status of patients with sickle cell disease. Pediatr Blood Cancer. 2015; 62(5): 924-925.
  26. Leikin SL, Gallagher D, Kinney TR, et al. Mortality in children and adolescents with sickle cell disease. Cooperative study of sickle cell disease. Pediatrics. 1989; 84(3): 500-8.

Article Info

Article Notes

  • Published on: July 28, 2017


  • Sickle cell disease

  • Sickle cell anemia
  • Genotype
  • Administrative data
  • Surveillance
  • ICD-9-CM codes


Mary M. Hulihan, DrPH
Centers for Disease Control and Prevention, 4770 Buford Highway Atlanta, GA 30341