Medical Cost Personal Data Analysis Dashboard

Medical-costs-dashboard-prediction-model-performance

On this page

Introduction to the Project

The Medical Costs Dashboard presents a comprehensive analytical exploration of healthcare insurance costs, demographic characteristics, behavioral risk factors, and predictive modeling outcomes using the widely referenced Medical Cost Personal Dataset. Developed using the Dashtera no-code business intelligence platform, the dashboard enables interactive exploration of 1,338 individual insurance records.

Through the use of bar charts, histograms, scatter plots, box plots, stacked distributions, and regression visualizations, the dashboard provides structured insights into how demographic attributes, lifestyle behaviors, and physiological indicators influence medical insurance charges.

The system supports data-driven understanding of cost variability, risk segmentation, and predictive performance, enabling analysts, insurers, and decision-makers to identify high-cost profiles and evaluate insurance pricing behavior.

Background and Data Source

The dataset used in this project originates from the publicly available Medical Cost Personal Dataset, commonly used for insurance analytics and regression modeling studies. The data captures anonymized insurance information for individuals residing across four major regions of the United States.

The dataset contains 1,338 policyholder records, each describing demographic attributes, health-related indicators, lifestyle behaviors, and corresponding medical insurance charges billed by providers.

This dataset is particularly suitable for insurance cost analysis due to its combination of demographic, physiological, and behavioral risk factors.

Dataset Description

The core variables in the dataset consist of the following major variable groups:

  • Demographics: age, sex, region
  • Health Indicators: body mass index (BMI), number of children
  • Lifestyle Behavior: smoking status
  • Derived Risk Metrics: age group, BMI category, risk level
  • Cost Measures: actual charges, predicted charges, residuals

The target variable is medical insurance charges, representing the total billed healthcare cost per individual. Here are the key dataset characteristics:

  • Total Records: 1,338
  • Average Age: 39.2 years
  • Average BMI: 30.7
  • Gender Distribution: 51% Male, 49% Female
  • Smoking Rate: 20.48%
  • Regions Covered: Northeast, Northwest, Southeast, Southwest
  • Charges Range: Approximately 1,120 to 63,770

Derived features such as age groups, BMI categories, risk levels, and interaction variables (age × smoker, BMI × smoker) were created to enhance interpretability and predictive modeling.

Dashtera

Dashtera is a cloud-based, no-code analytics platform designed to support the visual exploration and analysis of complex datasets. The platform enables users to construct interactive dashboards without programming, allowing for efficient examination of multidimensional data through line plots, bar charts, maps, regressions, and statistical summaries.

Dashtera’s interface allows data to be filtered, compared, and inspected from multiple perspectives, which makes it suitable for exploratory data analysis tasks.

Key Features

  • Integration with multiple data sources, including CSV files, APIs, and external repositories.
  • Support for a wide range of visualization types, such as line charts, bar charts, Pareto charts, and geographic maps.
  • Interactive drill-down capabilities for detailed examination of specific data segments.
  • Dynamic filtering that enables focused analysis based on selected criteria.
  • Built-in options for sharing dashboards to facilitate collaborative research and analysis.

Dashtera’s flexibility makes it particularly suitable for exploratory data analysis and predictive insight presentation in healthcare and insurance domains.

Dashboard Analysis

The three-page dashboard reveals several important insights. First, the population exhibits balanced age and regional distributions, allowing for generalizable analysis. Second, obesity and smoking emerge as dominant risk factors driving insurance costs, far outweighing the influence of gender, region, or family size.

Customer Profile Dashboard

The first dashboard is the Customer Profile which focuses on understanding the demographic composition, health indicators, and baseline cost distribution of the insured population.

Medical-costs-dashboard-customer-profile

KPI indicators summarize the dataset at a glance, highlighting 1,338 total policies, an average medical charge of 13,270, and an average predicted charge of 13,229, indicating strong alignment between observed and modeled values.

The population has an average age of 39.2 years and an average BMI of 30.7, placing the majority of individuals in the overweight or obese range.

Age group distribution is relatively balanced across Young Adult, Adult, Middle Age, Senior Adult, and Senior categories, with each group contributing between 216 and 306 records.

The age histogram demonstrates a broadly uniform distribution across adulthood, indicating that cost trends are not driven by age concentration alone.

Children count distribution shows that 574 individuals have no children, while small families (1–2 children) account for a substantial portion of the dataset. Large families (3–5 children) represent a smaller but notable segment.

BMI category analysis reveals that obesity dominates the dataset, with 707 individuals classified as obese and only 225 falling within the normal BMI range. This highlights a significant population-level health risk factor.

Region-wise policy counts are evenly distributed across the four US regions, ensuring geographic balance. Average age by region varies minimally, suggesting comparable demographic profiles nationwide.

Risk level distribution indicates that Medium Risk individuals form the largest group (691), followed by Low Risk (502) and High Risk (145). Smoking behavior shows that only 20% of individuals are smokers, yet this group plays a disproportionate role in cost escalation.

The charges category distribution reveals that most individuals fall into Medium Cost and Low Cost brackets, while Very High Cost cases represent a smaller but financially critical subset. The charges box plot highlights a right-skewed distribution with several high-cost outliers, a typical characteristic of healthcare expenditure data.

Risk Analysis Dashboard

This dashboard explores how age, BMI, and smoking interact to influence insurance costs.

Medical-costs-dashboard-risk-analysis

Scatter plots comparing charges versus age reveal stark differences between smokers and non-smokers. Non-smokers show a gradual and moderate increase in costs with age, whereas smokers exhibit a steep upward cost trajectory, particularly in later age ranges.

Age group–wise stacked cost distributions illustrate a clear progression: Young Adults are predominantly low-cost, while Senior Adult and Senior groups show increasing concentrations of high and very high cost categories.

BMI-based analysis shows similar patterns. Non-smokers display modest cost variation across BMI values, whereas smokers experience sharply increasing costs as BMI rises. Stacked bar charts confirm that obese individuals contribute disproportionately to high and very high cost categories.

The 3D visualizations (BMI × Age × Charges) provide integrated insights into multivariate risk. Among non-smokers, cost escalation remains relatively controlled across age and BMI. In contrast, smokers cluster heavily in high-charge regions as both age and BMI increase, illustrating compounding risk effects.

Overall, Page 2 demonstrates that smoking acts as a powerful cost amplifier, particularly when combined with higher age and BMI, while family size and region exert comparatively smaller influences.

Prediction & Model Performance

This dashboard 3 evaluates the performance of a Linear Regression model used to predict insurance charges.

Medical-costs-dashboard-prediction-model-performance

The actual charges versus predicted charges regression plot shows a strong linear relationship, indicating that the model captures general cost trends effectively. Separate regression views for smokers and non-smokers reveal that predictions are more accurate for non-smokers, while smoker-related extreme values introduce higher variance.

Residual analysis further supports this observation. The residual histogram shows errors centered near zero, indicating minimal systemic bias. However, the residual distribution exhibits heavier tails, reflecting challenges in predicting extreme high-cost cases.

Residuals plotted against predicted charges highlight increasing variance at higher charge levels, a common characteristic in healthcare cost modeling.

Box plots comparing actual and predicted charges across BMI categories and age groups demonstrate that the model accurately reflects median trends while underestimating some extreme outliers. This behavior aligns with the inherent unpredictability of high-risk healthcare expenses.

Collectively, Page 3 confirms that the regression model performs reliably for the majority of cases while identifying specific segments-primarily older obese smokers-where prediction uncertainty remains higher.

Conclusion

The three-page dashboard reveals several important insights. First, the population exhibits balanced age and regional distributions, allowing for generalizable analysis. Second, obesity and smoking emerge as dominant risk factors driving insurance costs, far outweighing the influence of gender, region, or family size.

The risk analysis confirms that cost escalation is not driven by single factors but by interactions between age, BMI, and smoking behavior. The predictive modeling results demonstrate that linear regression is effective for baseline forecasting, though extreme high-cost cases remain difficult to predict precisely.

The use of engineered features and interaction terms enhances interpretability and aligns predictive behavior with real-world insurance dynamics.

The Medical Cost Personal Data Analysis Dashboard provides a structured, multi-dimensional framework for exploring healthcare insurance costs using Dashtera’s interactive analytics environment. The dashboard enables:

• Demographic and population profiling
• Risk factor and lifestyle impact assessment
• Cost distribution and segmentation analysis
• Predictive modeling and residual evaluation

By integrating descriptive analytics with predictive insights, the dashboard supports insurance analysts, data scientists, and healthcare researchers in understanding cost drivers and identifying high-risk profiles.

The approach demonstrates how no-code BI platforms can effectively support complex analytical workflows in healthcare and insurance domains.

Share:

Read More

Want to see your data come to life?

Begin building your dashboards now, and unleash your creativity!

Dashtera-logo-for-dark
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.