Building a Weather Data Analytics Pipeline with AWS and OpenWeatherMap API

Hey coders! In this blog post, I will walk you through the process of building a weather data analytics pipeline using the OpenWeatherMap API and various AWS services. This project involves fetching weather data, storing it in an S3 bucket, cataloging it using AWS Glue, and querying it with Amazon Athena. Project Overview The goal of this project is to create a scalable and efficient data pipeline that can fetch weather data for multiple cities, store it in AWS S3, catalog the data using AWS Glue, and perform queries using Amazon Athena. Project Structure Prerequisites Before you begin, ensure you have the following: Docker: Installed on your machine. AWS Account: With permissions to create S3 buckets, Glue databases, and Glue crawlers. OpenWeatherMap API Key: Obtain an API key from OpenWeatherMap. Setup Instructions Step 1: Clone the Repository First, clone the repository and navigate to the project directory: git clone https://github.com/Rene-Mayhrem/weather-insights.git cd weather-data-analytics Step 2: Create a .env File Create a .env file in the root directory with your AWS credentials and OpenWeatherMap API key: AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_REGION=us-east-1 S3_BUCKET_NAME= OPENWEATHER_API_KEY= Step 3: Create a cities.json File Create a cities.json file in the root directory with a list of cities to study: { "cities": [ "London", "New York", "Tokyo", "Paris", "Berlin" ] } Step 4: Using Docker Compose Build and run the services using Docker Compose: docker compose run terraform init docker compose run python Usage After running the Docker containers, follow these steps: Verify Infrastructure Setup Ensure that Terraform has successfully created the necessary AWS resources (S3 bucket, Glue database, and Glue crawler). You can verify this in the AWS Management Console. Verify Data Upload Check that the Python script has fetched weather data for the specified cities and uploaded the data to the S3 bucket. Verify the JSON files in the S3 bucket via the AWS Management Console. Run the Glue Crawler The Glue crawler should automatically run if set up correctly. This will catalog the data in the S3 bucket. Verify the crawler's run and data cataloging in the Glue console. Query Data with Athena Use Amazon Athena to query the data cataloged by Glue. Access Athena through the AWS Management Console and run SQL queries on the data. Conclusion By following these steps, you can set up a robust weather data analytics pipeline using AWS services and the OpenWeatherMap API. This pipeline can be extended to include more cities or additional data sources as needed.

Jan 17, 2025 - 05:43
Building a Weather Data Analytics Pipeline with AWS and OpenWeatherMap API

Hey coders! In this blog post, I will walk you through the process of building a weather data analytics pipeline using the OpenWeatherMap API and various AWS services. This project involves fetching weather data, storing it in an S3 bucket, cataloging it using AWS Glue, and querying it with Amazon Athena.

Project Overview

The goal of this project is to create a scalable and efficient data pipeline that can fetch weather data for multiple cities, store it in AWS S3, catalog the data using AWS Glue, and perform queries using Amazon Athena.

Project Structure

Prerequisites

Before you begin, ensure you have the following:

  1. Docker: Installed on your machine.
  2. AWS Account: With permissions to create S3 buckets, Glue databases, and Glue crawlers.
  3. OpenWeatherMap API Key: Obtain an API key from OpenWeatherMap.

Setup Instructions

Step 1: Clone the Repository

First, clone the repository and navigate to the project directory:

git clone https://github.com/Rene-Mayhrem/weather-insights.git
cd weather-data-analytics

Step 2: Create a .env File

Create a .env file in the root directory with your AWS credentials and OpenWeatherMap API key:

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1
S3_BUCKET_NAME=
OPENWEATHER_API_KEY=

Step 3: Create a cities.json File

Create a cities.json file in the root directory with a list of cities to study:

{
  "cities": [
    "London",
    "New York",
    "Tokyo",
    "Paris",
    "Berlin"
  ]
}

Step 4: Using Docker Compose

Build and run the services using Docker Compose:

docker compose run terraform init
docker compose run python

Image description

Usage

After running the Docker containers, follow these steps:

Verify Infrastructure Setup

Ensure that Terraform has successfully created the necessary AWS resources (S3 bucket, Glue database, and Glue crawler). You can verify this in the AWS Management Console.

Verify Data Upload

Check that the Python script has fetched weather data for the specified cities and uploaded the data to the S3 bucket. Verify the JSON files in the S3 bucket via the AWS Management Console.

Image description

Run the Glue Crawler

The Glue crawler should automatically run if set up correctly. This will catalog the data in the S3 bucket. Verify the crawler's run and data cataloging in the Glue console.

Query Data with Athena

Use Amazon Athena to query the data cataloged by Glue. Access Athena through the AWS Management Console and run SQL queries on the data.

Image description

Conclusion

By following these steps, you can set up a robust weather data analytics pipeline using AWS services and the OpenWeatherMap API. This pipeline can be extended to include more cities or additional data sources as needed.