IBM NOAA Project: Predicting the Rain: A Data Science Journey through JFK Airport Weather Patterns

 

By Matthew Ternenge Beeun

Weather forecasting has always felt like a mix of high-level physics and a bit of magic. But as a Data Scientist, I know it’s actually a mix of high-quality data and robust statistical modeling. I recently completed a deep-dive analysis into the local climatological data from John F. Kennedy International Airport (JFK). Using R Studio, I set out to answer a fundamental question for a weather forecast firm: Can we accurately predict daily precipitation using variables like temperature, wind speed, and pressure?

The Approach: The 7-Step Workflow

I followed a rigorous data science lifecycle to ensure the insights were both statistically sound and business-relevant:

  1. Data Ingestion: Sourcing historical NOAA records.

  2. Wrangling: Cleaning "Trace" values and converting raw strings into usable numbers.

  3. Exploratory Data Analysis (EDA): Using histograms and scatterplots to see the "shape" of the weather.

  4. Modeling: Building a Multiple Linear Regression engine.

  5. Prediction: Testing the engine on "unseen" historical dates.

  6. Visual Assessment: Using heatmaps to identify our "MVP" (Most Valuable Predictor) variables.

  7. Numerical Validation: Calculating error metrics like MSE and RMSE.

What the Data Told Us

The most exciting part of any analysis is the "Aha!" moment. By plotting a Correlation Heatmap, I was able to single out exactly what drives the rain at JFK.

  • Humidity is King: There is a strong positive correlation (0.41) between relative humidity and rainfall.

  • The Pressure Signal: We saw a clear negative correlation (-0.21) with station pressure. In simple terms: When the barometer drops, grab your umbrella.

  • The Model’s Accuracy: Our model achieved a Root Mean Squared Error (RMSE) of 0.22. This means that on average, our forecast is off by less than a quarter-inch of rain—a respectable margin for a linear model in a highly variable environment.

The visuals that support these assumptions are below:


The Challenge of "Zero-Inflation"

One of the key takeaways from this project was the difficulty of predicting "No Rain" days. Because most days at JFK are dry, the linear model tends to over-predict small amounts of moisture. This is a classic "zero-inflation" problem in meteorology, and it highlights why data scientists are constantly iterating on their models!

Final Thoughts

This project was a great reminder that data isn't just numbers on a spreadsheet; it’s a story about our environment. By leveraging R and Multiple Linear Regression, we can turn raw atmospheric observations into actionable insights for airport operations and public safety.

Check out my full code and the cleaned dataset on my GitHub! Github

  • Keywords: Use tags like #RStats, #DataScience, #Meteorology, and #MachineLearning.

Comments

Popular posts from this blog

Google Analytics Case Study Project.

Google Data Analytics Project 2: Bike Share Case Study Project

Teach your Child Coding