Stroke Prediction Data Analysis Dashboard

Published December 1, 2025

Data Analytics Software QA Janaka Alwis

Introduction

The Stroke Prediction Data Analysis Dashboard was developed using the Dashtera no-code analytics platform to analyze demographic, clinical, and lifestyle factors contributing to stroke risk. Using a dataset of 5,110 patients containing demographic attributes (gender, age, residence, marital status), health conditions (hypertension, heart disease, BMI, glucose level), and lifestyle indicators (smoking, work type), the dashboard enables interactive exploration of risk trends and outcome associations.

By transforming raw health data into dynamic, multidimensional visualizations, this dashboard helps clinicians, data scientists, and public health researchers identify stroke risk patterns, compare demographic segments, and evaluate the relationship between medical and lifestyle factors. Dashtera’s powerful visual analytics features support rapid insights into the most critical predictors of stroke occurrence.

Dataset

The dataset originates from an open health records collection used for stroke prediction research. Each record represents an individual patient, with labeled information indicating whether the person experienced a stroke. The data integrates clinical, demographic, and behavioral dimensions, making it highly suitable for both descriptive analytics and predictive modeling.

Dataset Variables Include:

Demographic: gender, age, age_group, Residence_type, ever_married, work_type
Clinical: hypertension, heart_disease, avg_glucose_level, bmi, bmi_group
Lifestyle: smoking_status
Outcome: stroke (binary)

With a stroke prevalence of 4.87% (249 out of 5110 cases), the dataset provides a balanced foundation for risk analysis while revealing the rarity and clinical significance of stroke events.

Description

Key dataset highlights include:

Total Records: 5,110 patients
Stroke Cases: 249 (4.87%)
Average Age: 43 years
Average BMI: 28.86
Average Glucose Level: 106.15 mg/dL
Gender Distribution: 2,116 males and 2,994 females

The dataset spans a diverse population across urban and rural residences, capturing lifestyle diversity (work type, smoking status) and chronic health burdens (hypertension and heart disease). It is structured to enable correlation between medical risk factors and stroke occurrence through visual analytics.

About Dashtera

Dashtera is a cloud-based, no-code analytics platform designed to support the visual exploration and analysis of complex datasets. The platform enables users to construct interactive dashboards without programming, allowing for efficient examination of multidimensional data through line plots, bar charts, maps, regressions, and statistical summaries. Its interface allows data to be filtered, compared, and inspected from multiple perspectives, which makes it suitable for exploratory data analysis tasks.

Key Features

Integration with multiple data sources, including CSV files, APIs, and external repositories.

Support for a wide range of visualization types, such as line charts, bar charts, Pareto charts, and geographic maps.

Interactive drill-down capabilities for detailed examination of specific data segments.

Dynamic filtering that enables focused analysis based on selected criteria.

Built-in options for sharing dashboards to facilitate collaborative research and analysis.

Relevance to This Project

Dashtera’s visual workflow allowed this project to integrate multiple chart types — histograms, box plots, parallel coordinate charts, heatmaps, and donut charts — across three interactive pages. This provides a layered view from overall trends to individual risk profiles, enabling users to transition seamlessly from overview to detail.

Dashboards

The Overview page provides a foundational understanding of the dataset by summarizing population metrics and general health distributions. At the top of the dashboard, a set of Key Performance Indicators (KPIs) presents aggregated metrics for the entire cohort: 5,110 total patients, 249 confirmed stroke cases, representing a prevalence rate of 4.87%. The mean patient age is 43 years, with an average BMI of 28.86 and an average glucose level of 106.15 mg/dL. These indicators establish the baseline clinical context for the subsequent analyses.

A series of histograms illustrate how age, glucose level, and BMI vary across the overall population, as well as between stroke and non-stroke subgroups. The age distribution demonstrates a clear right-skew, with stroke incidence increasing steadily in older groups, particularly after age 55. Glucose level distributions reveal that patients with elevated glucose concentrations show a noticeably higher proportion of stroke cases, aligning with known metabolic risk correlations. Similarly, BMI histograms show moderate shifts toward higher BMI values among stroke patients, although the relationship is less pronounced than for glucose.

Complementing these histograms, donut charts summarize categorical distributions. Gender composition shows 2,116 males and 2,994 females, indicating a modest female predominance. Stroke distribution, represented as another donut chart, highlights the imbalance between affected (4.87%) and unaffected (95.13%) populations, emphasizing the rarity yet clinical importance of stroke events in this dataset.

The Box plots charts — Age by Glucose Level, Glucose Level by Age Groups, and Age by BMI Groups — provide relational context. These bivariate plots indicate a positive trend between aging and glucose elevation, which together form a compound risk pattern that likely amplifies stroke susceptibility. The Overview dashboard thus serves as a statistical baseline, guiding the viewer toward deeper exploration of contributing factors.

Risk Factors

The second dashboard focuses on the disaggregation of stroke risk across medical and lifestyle determinants. A sequence of bar charts visualizes the prevalence of stroke under different categorical conditions — gender, hypertension, heart disease, glucose levels, BMI, age, smoking status, marital status, work type, and residence type. Each chart displays annotated frequencies, enabling quantitative comparison between subgroups.

The hypertension and heart disease variables show the strongest differentiation: stroke occurrence is markedly higher among hypertensive and cardiac patients compared to those without such conditions. This observation reinforces the established cardiovascular-stroke comorbidity pathway. Similarly, bar charts depicting average glucose level and BMI distributions confirm the metabolic contribution to stroke risk, as both hyperglycemic and obese individuals demonstrate increased incidence.

The variable ever_married displays an interesting sociological gradient, where married individuals show higher stroke rates. However, this relationship is most likely age-mediated rather than causal, since older participants are also more likely to be married. Work type and residence type further contextualize lifestyle exposure: private-sector employees and rural residents exhibit slightly higher stroke proportions, potentially reflecting occupational and access-related disparities in health management.

The box plots and scatter plots provide statistical precision beyond the categorical charts. The Stroke by Age box plot confirms median age elevation among stroke cases. The BMI vs. Average Glucose scatter chart, color-coded by stroke outcome, visually clusters stroke patients in the high-BMI/high-glucose quadrant, highlighting synergistic risk interactions. A 3D point chart mapping Age × BMI × Average Glucose adds a multivariate view, demonstrating that patients exceeding thresholds in all three variables form a distinct, high-risk cluster.

Overall, the Risk Factors page elucidates how stroke likelihood scales with physiological stressors (hypertension, glucose, BMI) and demographic maturity (age), validating the use of these variables in predictive modeling frameworks.

Demographics

The third dashboard expands the analysis to demographic, clinical, and lifestyle dimensions using composite and multivariate visualization techniques. It begins with KPI cards quantifying stroke proportions across key demographic groups: female stroke rate (2.76%), male (2.11%), rural (2.64%), urban (2.23%), married (4.31%), and unmarried (0.57%). These values suggest slight gender and residence imbalances, with a noticeably higher risk among married individuals, again likely age-related.

The highlight of this page is the integration of Parallel Coordinate Set charts, which provide multidimensional pattern visualization across categorical and numerical factors.

Three distinct analytical perspectives are presented:

Clinical Risk Profile:
This chart sequences Age Group → Hypertension → Heart Disease → Glucose Group → BMI Group. It reveals that most stroke cases align along paths with older age groups, elevated glucose, and hypertension, forming a clinically coherent risk trajectory.
Demographic Influence:
Constructed as Gender → Age Group → Residence Type → Ever Married → Work Type, this visualization captures socio-demographic stratifications. It shows that stroke clusters most prominently among older, married, rural males working in private employment — a pattern consistent with aging and lifestyle risk accumulation.
Lifestyle and Health Mix:
This combines Smoking Status → BMI Group → Glucose Group → Hypertension → Stroke, exposing how unhealthy habits and metabolic conditions jointly impact stroke outcomes. The presence of “formerly smoked” individuals in moderate-to-high BMI and glucose clusters suggests long-term exposure effects rather than acute behaviors.

Beyond the coordinate plots, heat maps provide dense two-dimensional representations of combined risk. The Stroke Cases by Age Group × BMI Category heatmap indicates that overweight and obese categories dominate in the 50–70 age range, whereas Age Group × Glucose Level heatmaps show clear high-density regions among older, hyperglycemic individuals.

Finally, donut charts summarize categorical stroke distributions by age, glucose, residence, and BMI levels. These serve as intuitive summaries of the multi-dimensional findings visualized earlier, reinforcing that the majority of stroke cases occur among older, higher-glucose, higher-BMI individuals, independent of residence type.

The Demographics dashboard therefore provides a high-level synthesis of population heterogeneity and inter-factor dependencies, emphasizing that stroke emerges not from isolated causes but from interconnected biological and social determinants.

Summary of the Analytical Findings

Across all three dashboards, the Dashtera-based analysis presents a coherent epidemiological narrative. The Overview establishes foundational patterns; the Risk Factors dashboard quantifies the influence of individual predictors; and the Demographics page integrates these predictors into multi-dimensional frameworks. Collectively, the visualizations reveal that age, hypertension, glucose level, and BMI form the core physiological risk cluster, while smoking status, marital status, and residence provide contextual modifiers.

This integrated, visual-analytic approach demonstrates how no-code platforms like Dashtera can facilitate deep, reproducible, and interpretable insights in public health informatics.

Discussion

The Stroke Prediction Dashboard highlights several critical findings:

Age and glucose are dominant predictive variables for stroke.

Hypertension and heart disease act as compounding risk amplifiers.

Lifestyle factors such as smoking and work type modify clinical risks.

Parallel coordinate analysis provides a holistic understanding of how medical, demographic, and lifestyle elements interact.

By combining statistical and categorical visuals, Dashtera enables both clinicians and analysts to explore population-level and patient-level risk dynamics seamlessly.

Conclusion

The Stroke Prediction Data Analysis Dashboard with Dashtera demonstrates how no-code visual analytics can transform raw health data into actionable insights. With three focused dashboard pages—Overview, Risk Factors, and Demographics—the system supports:

Rapid exploration of key predictors of stroke.

Interactive filtering to isolate demographic or medical groups.

Visual pattern detection across multiple dimensions.

This framework can be extended to predictive modeling, real-time risk monitoring, or integration into clinical decision-support systems. Dashtera’s low-code agility ensures such analytical capabilities can be achieved efficiently and accessibly for healthcare research and public health planning.

Category: