STI Through The Eyes Of Filipino Women
A data exploration journey where we delve into women’s perception of HIV. Armed with laptops(+coffee:>) and curiosity, we aim to uncover how education shapes attitudes towards HIV among women. While we’re just students with data in hand, we hope to stimulate discussions and contribute to the development of targeted programs and educational campaigns aimed at addressing HIV among women.

Quick Look at Our Project.


Why Are We Doing This?

Virus Icon
As HIV remains a significant public health concern in the Philippines, many still lack knowledge and hold misconceptions about HIV transmission and prevention methods despite ongoing efforts to educate the public.
Discussion Icon
Without open discussions on sex and health, many women in the Philippines are unaware of HIV/AIDS and safety measures. This lack of knowledge puts them at risk and unaware of how to protect themselves.

Data Collection

How did we collect our data?

Where Did We Get Our Data?


Our dataset is a subset of the 2022 Philippines National Demographic and Health Survey, featuring about 35,000 households. Our focus is on basic demographic indicators, health status, and knowledge and attitudes regarding HIV/AIDS.

What Was Our Sampling Method?


The sampling scheme provides data representative of the country as a whole, for urban and rural areas separately, and for each of the country’s administrative regions. Initially, 1,247 primary sampling units (PSUs) were chosen systematically by province or HUC. In the second stage, 22 or 29 housing units were systematically random sampled from each PSU.

How Did We Clean Our Data?


We prepared the data by renaming columns, handling missing values, removing irrelevant columns, standardizing data types to integers where applicable, and splitting, cleaning, and recombining the dataset to ensure a streamlined and accurate dataset ready for analysis.

Data Exploration

How did we explore our data?



Using a Jupyter Notebook

Preprocessing


From our chosen 17 survey questions to gauge perceptions of HIV and other STIs among respondents aware of these conditions. We applied various preprocessing techniques such as handling missing values, standardizing numerical features, and encoding categorical variables to ensure that our data was clean and ready for analysis.

Visualization


We used Plotly to create interactive pie charts for each survey question. These visualizations allowed us to understand the distribution of responses (Yes, No, Don't Know) for each question. We configured subplot layouts to neatly display the pie charts, making it easier to compare the response distributions across all questions.

Quantification


We calculated the "Perception Mean" by taking the average response across all relevant survey questions for each respondent. This provided a single metric to represent overall perception. We displayed the numerical description of the Perception Mean to understand its distribution.

Group Analysis


We examined how the Perception Mean varied by different features such as Age, Region, Residence, Language, and Educational Level. We displayed count, mean, minimum, and maximum perception values grouped by the highest educational level.

Data Modeling

How did we model our data?

Visualization for Modeling

We used Plotly to create interactive plots that helped us visualize relationships within the data. These visualizations not only helped in exploration but also set the stage for more sophisticated modeling. By understanding the distribution of perception scores, we prepared the data for potential predictive modeling tasks.

Perception Mean Box Plot

Box Plot


We created a box plot to visualize the distribution of the Perception Mean by different educational levels. This helped in identifying any outliers and understanding the central trend and spread of the perception scores among the different education groups.
Perception Mean Box Plot

Normal Distribution


We generated scatter plots overlaid with normal distribution curves to visualize the distribution of the Perception Mean for each educational level group. This provided insights into the shape and spread of the data and how perception is distributed across different educational backgrounds.
Master Plot

Nutshell Graph


The nutshell graph illustrates the mean perception of sexually transmitted infections (STIs) among different ages, separated by their highest educational attainment, based on data from the 2022 Philippines National Demographic and Health Survey. The y-axis represents the perception level means (ranging from 0.45 to 0.74), while the x-axis represents the ages (ranging from 15 to 50).

The lines are color-coded to indicate the educational levels: blue for primary, red for secondary, and green for higher education. Those aged 15 years old with solely a primary level education possess the lowest perception mean at 0.4538. Meanwhile, those aged 19 years old with a college-level education possess the highest at 0.7353.

The data suggests a correlation between educational attainment and STI awareness, with higher education leading to greater awareness. This highlights the vital role of education in enhancing STI awareness.


Linear Regression Model

We used scikit-learn’s Linear Regression model to understand the relationship between our target, "Perception Mean," and severalfeatures: "Respondent's current age," "Age in 5-year groups," "Highest educational level," and "Highest year of education."

Training and Testing

Training and Testing


We split the data into 70% training and 30% testing sets to train and evaluate our model. The LinearRegression object was instantiated and trained using the fit method on the training data.
Evaluation Metrics

Evaluation Metrics


Our regression models were assessed using R-squared and Mean Squared Error (MSE). These metrics indicate the model’s performance, where lower values of MSE imply better accuracy, and R-squared measures the proportion of variance in the dependent variable explained by the independent variables.
Feature Analysis

Feature Analysis


We analyzed each feature individually to determine its impact on "Perception Mean." This involved creating separate models for each feature and calculating their coefficients along with the corresponding MSE, and R-squared values.


Results

Meet The Team

We're here to help if you have any questions...

Ezra Guiao

III- BS Computer Science | CS 132 WFV

Shane Odhuno

III- BS Computer Science | CS 132 WFV

Gabbie Purisima

III- BS Computer Science | CS 132 WFV