UCI Heart Disease Data Analysis

Uci-heart-disease-data-analysis-correlations

On this page

Introduction

The UCI Heart Disease Data Analysis Dashboard presents a comprehensive analytical exploration of cardiovascular risk factors, demographic patterns, and disease severity indicators using the well-established UCI Heart Disease dataset. Built on the Dashtera no-code analytics platform, the dashboard enables interactive visualization of 920 patient records across four international study centers. 

Through histograms, boxplots, correlation charts, distribution summaries, and multivariate visualizations, the system provides researchers, clinicians, and analysts with structured insights into the clinical attributes contributing to heart disease. 

The dashboard supports evidence-driven understanding of population characteristics, prevalence differences by demographic groups, risk factor patterns, and the relationships between clinical measurements and disease outcomes. 

Dataset

The dataset used in this project originates from the publicly available UCI Heart Disease Database, with the records sourced from four medical research centers: 

  • Cleveland Clinic Foundation (USA) 
  • Hungarian Institute of Cardiology (Hungary) 
  • Switzerland University Hospital 
  • VA Long Beach Medical Center (USA) 

The dataset includes 920 individual patient observations, each annotated with demographic details, symptomatic information, laboratory measurements, and diagnostic attributes linked to heart disease severity. 

Core Variables 

The dataset spans a diverse range of clinical and demographic factors: 

  • Demographics: age, sex, dataset source 
  • Symptoms & Observations: chest pain type, resting ECG result, ST depression, exercise-induced angina 
  • Vital Signs & Measurements: resting blood pressure, cholesterol, maximum heart rate 
  • Anatomical Data: slope of ST segment, number of major vessels (ca), thalassemia type 
  • Outcome Variable: heart disease likelihood (num: 0–4) 

The combination of demographic, symptomatic, and physiological attributes makes this dataset clinically meaningful for detecting patterns and assessing early disease risk. 

Key characteristics of the cleaned dataset include: 

  • Total Records: 920 
  • Sex Distribution: 79% Male, 21% Female 
  • Data Sources: Cleveland (304), Hungary (293), Switzerland (123), VA Long Beach (200) 
  • Heart Disease Severity Classes (num): 
    • 0 = No Heart Disease 
    • 1 = Mild 
    • 2 = Moderate 
    • 3 = Severe 
    • 4 = Very Severe 

Missing Value Treatment 

  • Null values were filled using the mean for continuous variables and the mode for categorical variables. 
  • Zero values were preserved, as many represent valid clinical readings. 

The final dataset offers a reliable and consistent foundation for advanced visualization and statistical interpretation. 

About Dashtera

Dashtera is a cloud-based, no-code analytics platform designed to support the visual exploration and analysis of complex datasets. The platform enables users to construct interactive dashboards without programming, allowing for efficient examination of multidimensional data through line plots, bar charts, maps, regressions, and statistical summaries. Its interface allows data to be filtered, compared, and inspected from multiple perspectives, which makes it suitable for exploratory data analysis tasks. 

Key Features 

  • Integration with multiple data sources, including CSV files, APIs, and external repositories. 
  • Support for a wide range of visualization types, such as line charts, bar charts, Pareto charts, and geographic maps. 
  • Interactive drill-down capabilities for detailed examination of specific data segments. 
  • Dynamic filtering that enables focused analysis based on selected criteria. 
  • Built-in options for sharing dashboards to facilitate collaborative research and analysis. 

Dashboard

Demographics & Risk Factors Overview

The first dashboard presents the demographic makeup and baseline clinical measurements of the 920 individuals in the dataset. 

Uci-heart-disease-data-analysis-overview

The age, resting blood pressure, cholesterol, and maximum heart rate histograms all exhibit distributions that approximate a normal or near-normal patternindicating that the physiological measures broadly align with expected population-level behavior. In contrast, the ST depression (oldpeak) distribution follows a gamma-like skewed distribution, reflecting the tendency of most individuals to have low ST depression values, while a smaller subset presents with higher, clinically significant deviations. 

The population is predominantly male, with 79% male and 21% female, highlighting a strong gender imbalance. Fasting blood sugar levels show that 782 individuals have normal fasting glucose, while 138 present elevated levels, indicating that abnormal fasting glucose is relatively less common. Exercise-induced angina is absent in the majority (583 with no angina) but present in a substantial minority (337 with angina), suggesting meaningful variation in exercise-related cardiac symptoms. 

Gauge charts summarizing the most recent patient’s readings effectively contextualize key cardiovascular indicators-resting blood pressure, serum cholesterol, maximum heart rate, and ST depression positioning the observed values within the broader dataset distribution, offering intuitive reference points for clinical interpretation. 

Categorical comparisons further highlight several important patterns. Age categorization reveals that senior individuals (375) and elderly individuals (222) form the largest cohorts, while very elderly individuals represent a much smaller group (31). Chest pain type is overwhelmingly dominated by asymptomatic presentation (496), with fewer cases of non-anginal pain (204), atypical angina (174), and typical angina (46). ST-segment slope patterns favor the flat slope (654), followed by upsloping (203) and downsloping (63), reflecting characteristic ECG variations. Dataset sources are well distributed across Cleveland, Hungary, Switzerland, and VA Long Beach, with totals ranging from 123 to 304 cases. 

Finally, the distribution of major vessels colored by fluoroscopy shows a heavy skew toward zero-vessel involvement (792 individuals), with progressively fewer cases for one (67), two (41), and three vessels (20), indicating that significant arterial involvement is present but not predominant in this dataset. 

Disease Patterns & Risk Relationships 

The second dashboard examines how demographic, behavioral, and clinical predictors relate to the target disease outcome.

Uci-heart-disease-data-analysis-patterns-and-risks

The relationship between age categories and heart disease severity shows a clear progression: senior and elderly groups contain the highest proportions of mild to severe heart disease cases, while younger groups exhibit largely non-disease or mild-disease profiles. For example, seniors contribute 118 mild46 moderate, and 37 severe cases, illustrating age’s strong and progressive association with disease severity. 

Vertical bar charts reveal distinct sex-based patterns. Female patients show a majority of non-disease outcomes (144) and relatively few severe cases, whereas male patients have substantially higher counts in all disease categories, including 235 mild99 moderate, and 99 severe cases, underscoring sex as a significant differentiator in cardiac risk expression. 

Fasting blood sugar status demonstrates similar gradient shifts: individuals with normal fasting sugar contribute the majority of non-disease cases (367) yet still exhibit notable presence in severe categories (80 severe). Elevated fasting sugar is associated with proportionally higher moderate and severe disease representation (23 moderate, 27 severe), supporting its role as a metabolic risk amplifier. 

Exercise-induced angina displays one of the clearest patterns in the dashboard. Individuals without exercise angina account for most of the non-disease category (356), while those experiencing angina during exertion contribute disproportionately to moderate (58) and severe (64) disease classes. This aligns with established clinical evidence linking exercise-related ischemia to higher disease progression. 

Stacked bar charts deepen these insights by illustrating multi-category relationships. Chest pain type reveals that asymptomatic individuals, though the largest group, also harbor many moderate and severe cases. Resting ECG patterns show that ST-T abnormalities and left ventricular hypertrophy correspond to higher disease severity. Slope and vessel count (ca) also display clear gradients: flat slopes and increasing vessel involvement correspond to rising heart disease severity, with the three-vessel group showing the highest proportional severe disease representation. 

Together, the second dashboard demonstrates coherent, clinically plausible relationships across multiple predictors, illustrating how demographic and functional variables interact with disease intensity. 

Correlations and Multivariate Insights

The third dashboard explores pairwise and multivariate relationships to reveal deeper interaction patterns within the dataset.

Uci-heart-disease-data-analysis-correlations

Scatter plots demonstrate a positive trend between age and resting blood pressure, consistent with physiological arterial stiffening over time. Age also shows a generally increasing pattern with serum cholesterol, while its relationship with maximum heart rate is clearly negative-older individuals achieve markedly lower heart rate during exertion. Age and ST depression exhibit an upward trend, suggesting increased ischemic burden with advancing age. 

Correlations among physiological measures reveal consistent patterns: resting blood pressure correlates positively with serum cholesterol, while both show slight negative associations with maximum heart rate. Serum cholesterol displays a modest but consistent upward relationship with ST depression, indicating that lipid abnormalities may contribute to ischemic changes. Maximum heart rate shows a downward trend as ST depression increases, reinforcing the clinical association between impaired exercise capacity and ischemic response. 

The 3D visualizations provide an integrated view of multivariate dynamics. The relationship between age × resting blood pressure × serum cholesterol reveals a clustered progression where older individuals tend to simultaneously exhibit higher blood pressure and elevated cholesterol. The second 3D model, integrating resting blood pressure, cholesterol, and maximum heart rate, shows that individuals with both high cholesterol and high blood pressure tend to occupy lower maximum-heart-rate regions, highlighting compounding cardiovascular stress effects. 

Overall, the third dashboard illustrates coherent interaction patterns across key metabolic, hemodynamic, and functional indicators, reinforcing the multi-factorial nature of cardiovascular risk and highlighting how structural trends align with clinical expectations. 

Discussion

The three-page report highlights several clinically meaningful patterns. 
First, the demographic structure reveals a heavily male-dominated dataset, which aligns with the historical underrepresentation of women in cardiac studies. Elevated age groups-particularly the Senior and Elderly categories-show disproportionately high rates of heart disease. 

Second, key risk factors such as resting blood pressure, cholesterol, and ST depression follow biologically consistent distribution patterns (normal or gamma-shaped), supporting the reliability of the dataset. 

Third, the second dashboard demonstrates the progressive escalation of clinical abnormalities with increasing disease severity. Variables such as chest pain type, slope of the ST segment, and vessel blockage show strong, interpretable associations with heart disease risk. 

Finally, the third dashboad multivariate charts uncover deeper correlations, including the interplay between age, cholesterol, heart rate, and pressure measurements. The negative relationship between age and maximum heart rate is particularly prominent, as is the clustering of high-risk patients with multiple elevated measurements. 

Together, these dashboards provide a structured analytical framework for visualizing cardiovascular risk. 

Conclusion

The UCI Heart Disease Data Analysis Dashboard offers a rigorous, multi-dimensional platform for exploring demographic factors, clinical measurements, and heart disease outcomes. Using Dashtera’s interactive analytics environment, the dashboard enables: 

  • Population-level demographic analysis 
  • Risk factor distribution assessment 
  • Disease relationship modeling 
  • Multivariate and correlation-based exploration 

The dashboard supports clinicians, data analysts, and medical students in understanding the interplay of cardiovascular parameters and provides a visual basis for detecting emergent risk patterns. Its structured insights can contribute to early detection strategies, improved diagnostic processes, and public health research. 

Share:

Read More

Want to see your data come to life?

Begin building your dashboards now, and unleash your creativity!

Dashtera-logo-for-dark
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.