Setting up a local docker PostgreSQL database and exploring the F1 Ergast dataset using SQL

This is my main repo for showcasing my SQL competency. The project is entirely self-contained and can be cloned and run as is to use both the docker SQL database and the queries that I have written. I have specifically chosen a subset of the Ergast F1 dataset that I found to be the most interesting in my initial inspection of the data, used SQL to define and populate the database tables and then used SQL to combine individual tables into two separate output tables that each highlights different aspects of the dataset, using different SQL approaches.

The file schema.sql defines the tables, loads and filters the data and populates the database to prepare it for queries.

The queries to create the output are written in the analysis.sql files, which includes the two queries along with a few setup commands for temporary tables and commentary explaining what my approach was for the dataset and what I found interesting about the output from each query. Each query also includes multiple SELECT FROM / ORDER BY statements at the bottom, which are meant to be used one at a time, which is why all but one are commented out by default. Each of those SELECT FROM / ORDER BY statements highlight an interesting finding from the rather large output tables, that would otherwise be harder to see without sorting it in specific ways.

Features

Two queries with different purposes:

Query #1 looks at individual racers performance over time and compares metrics like avg. points per win, historical standard deviation for points etc., with the purpose of looking at interesting differences between the different drivers' performance.
Query #2 looks at unique combinations of teams and drivers and specifically highlight potential strategic choices made by team when choosing what drivers to hire. As in, did a team choose two high volatility drivers or a more balanced pairing of drivers for their new team? How did the drivers standard deviation in points scored develop over while while in that unique pairing of teams and drivers? Amongst other questions. This query also tackles the classic gaps and islands problem, in order to be able to correctly split unique combinations of teams and drivers into stint. As a unique combination of a team and two drivers can potentially be repeated later in time, with a gap in between, the gaps and islands approach allows for those two stints to be correctly identified.

Basic PowerBi visualization

I created a basic PowerBi visualization to show that the database imports cleanly into PowerBi and that the data output of my SQL queries is sensible. The visualization can be seen here:

https://app.powerbi.com/view?r=eyJrIjoiZDFhOGMyMTMtYzBjMS00Mjc1LTgzN2UtMGJjNjEzMDA3N2ZlIiwidCI6IjcwZjRhY2NiLTM3N2UtNDg5ZS04YjhiLTI4NjllYjQwYmQ3MSJ9

The graphs is just a simple scatterplot showing the difference in drivers' total career points between using the legacy scoring models in place when a given race took place and adjusting all of a driver's points to match the newest 25-point scoring model used in modern F1 races.

Legacy career points are shown on the x-axis. Modern career points are shown on the y-axis and the dots a colored depending on how large the difference is between the modern and the legacy career points totals:

Green: Less than 50% difference
Yellow: Between 50% and 100% difference
Red: More than 100% difference

Recommended tooling (what I used)

VS Code for displaying the code and running the queries
- The Docker / Container Tools extension for VS Code
- The SQLTools & SQLTools PostgreSQL extensions for VS Code
Docker

Spin up the Docker database using the command docker compose up -d in PowerShell.

Data Sources

F1 Ergast dataset from Kaggle: https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020?resource=download

Tech Stack

SQL
Docker

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
F1-Ergast-data-files		F1-Ergast-data-files
.gitattributes		.gitattributes
.gitignore		.gitignore
Analysis.sql		Analysis.sql
README.md		README.md
docker-compose.yml		docker-compose.yml
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setting up a local docker PostgreSQL database and exploring the F1 Ergast dataset using SQL

Features

Basic PowerBi visualization

Recommended tooling (what I used)

Data Sources

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Setting up a local docker PostgreSQL database and exploring the F1 Ergast dataset using SQL

Features

Basic PowerBi visualization

Recommended tooling (what I used)

Data Sources

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages