title	Stat 184 Course Project Guidelines
date	Spring 2026

This repo will serve as a template for your Stat 184 course project, providing you with an initial README file, two CSL files (APA7 and MLA9) for reference citation styles, initial .gitignore and .lintr files, and the project guidelines (this file).

This Project Guidelines file contains all of the guidelines for the project, including key dates and a list of learning outcomes assessed. Further, we've attempted to provide you with some hints and suggestions. Be sure to read through all portions of the Project Guidelines carefully.

Project Context and Purpose

The context for this project that you’ve made it through several phases of a hiring process at Big Data, Inc. They are looking for the best statisticians, data scientists, and data analysts to join the firm.

They have hired you as a Probationary Data Scientist and want to see how you work as part of a team and what you can produce. Thus, this project acts as your demonstration of your skills and knowledge.

The purpose of this project is to provide you with an opportunity to put into practice everything that you have learned over the course of the semester and to push yourselves. To this end, you’ll work in teams (2-3 people) to conduct your own Exploratory Data Analysis, posing and answering your own research question(s).

Key Dates

There are multiple key dates that you and your project team are going need to be aware of as you complete this project.

Date	Notes
Sec. 1 & 4: 4/20/26 Sec. 2 & 5: 4/16/26	Checkpoint #1: Form Teams and make your team contract (on paper, due by end of class) You must be physically present in class in order to join a team.
Sec. 1 & 4: 4/22/26 Sec. 2 & 5: 4/21/26	Checkpoint #2: Approval of Initial Data and Plan (in MOM, due by end of class)
Sec. 1 & 4: 4/24/26 Sec. 2 & 5: 4/23/26	Checkpoint #3: GitHub Repo Check (in MOM, due by end of class)
Sec. 1 & 4: 4/24/26 Sec. 2 & 5: 4/23/26	Checkpoint #4: Sign up for Work-in-Progress Presentation Slot (in class)
4/27/26--5/1/26	Checkpoint #5A: Work-in-Progress Presentations and Checkpoint #5B: Peer Feedback Each person MUST be present for all presentations.
5/6/26 Sec. 2: 5/5/26	Checkpoint #6: Submission of Final Report (PDF, Canvas) and Link to Repo (submission comment)
5/7/26 Sec. 2: 5/5/26	Checkpoint #7: Submission of Self- & Peer Evaluations

Checkpoints #1-#5 must be completed in person.

Project Guidelines

The following checklist provides your team with an expanded listing of all elements necessary in your group project. You can keep track of what you have and have yet to complete by editing this file and placing an x inside of the square brackets at the start of each item (i.e., [x]).

Data Requirements

Your team may work with any data collection you wish, provided you satisfy certain conditions. These conditions include:

The data may not by any of the sets/collections we've used in the class including any examples, homework, activities, quizzes, or other assignments.
The data should be real, not fake or synthetic. You may not use genAI tools to create or find data sets. If you are unsure, talk to your instructor.
Your data sets/collections should be rich for exploration. That is, there should be multiple attributes (i.e., more than 3) and multiple types (e.g. categorical/qualitative and quantitative) for you to explore in your team. We strongly encourage your team to look for multiple data sets/collections that you can link together so that the sets complement/supplement each other.
You must be able to explain the data provenance as fully as possible for each data set/collection you use. This includes the who, what, when, where, why, and how the data were originally collected.
You must be able to explain how your data satisfy (or not) both the FAIR and CARE principles.
Your data may not come from or be found on Kaggle. It is your team's responsibility to complete this check. Projects using data from Kaggle will be treated as non-submissions.
Your data may not come from another project you are doing or have done for a grade in any other course without prior permission. This also includes any Honors thesis or projects.
You may only use data that you have a legal right to be using.

The key factor in determining whether a certain data set/collection is allowed is the analysis. We want your project to reflect your thinking and skill sets from this course, not someone else's (including genAI). To this end, you cannot submit data analysis work that you've done on a particular data set for another course as this is a violation of Academic Integrity (i.e., self-plagiarism). Thus, if you want to use a data set/collection you've used in another course, you are going to have to provide clear documentation of what you've done for that other course and ensure that you (or any member of your team) do not duplicate that work for this project.

We do not allow Kaggle data sets/collections to be used for this project for three reasons. First, the data that is shared on Kaggle is not necessarily real data. Second, the data (real or fake) may have been posted without permission or acknowledgement of the original data stewards. Finally, Kaggle data sets/collections are often accompanied by data analysis reports from one or more Kaggle users. This then raises questions about whether we are looking at your work or you copying someone else.

If there are data sets you are interested in using that come from R packages, talk to your instructor before using them. If you or a team member works in a research lab on campus and/or are doing research on campus, you may use that data provided you've been given written permission to use that data from the lab's Principle Investigator (typically, a faculty member in charge).

Work-in-Progress Presentations

In the last week of classes (Apr. 27th--May 1st, 2026), each group will give a short/brief presentation to the class (approximately 5 minutes). In the presentation, each team will share

what they are exploring,
one (1) insight they have had so far,and
one (1) challenge they have encountered as they work (if any).

Each member of the team must take part in the presentation and speak. As the name suggests, these presentations are not meant for teams to present polished/completed reports. Rather, the idea is to convey what you're currently working on.

Peer Feedback

While each team is giving their presentation, all other students in the class will be filling out a simple peer feedback form. Students who do not pay attention to their peers and provide meaningful feedback will receive a lower grade.

Additional Guidance and Hints

There are a couple of pieces of advice that we want to give your team at this time for your project.

Planning will help your team get and stay organized.
Communicate! The success of your team will hinge upon how well you communicate with each other.
Draw upon each other's strengths and have everyone take a look at each other's work.
Proofread your report before you submit.
When you're unsure of something or running into a problem the team can't figure out, use your network of peers and your instructor.

Data and Topics

Is your team stuck trying to figure out where to look for data or a topic to explore? Check out these resources.

Data Repositories

https://www.icpsr.umich.edu/sites/icpsr/home
https://www.re3data.org/
https://openpaymentsdata.cms.gov/
https://aact.ctti-clinicaltrials.org/
https://www.esportsearnings.com/
SCORE Sports Data Repository: https://data.scorenetwork.org/
CMU Statistics and Data Science Repository: https://cmustatistics.github.io/data-repository/
The R package {quantmod} allows you to read Yahoo Finance data. See more at https://www.quantmod.com/examples/intro/
https://www.data.gov/
https://github.com/fivethirtyeight/data
https://github.com/nytimes
https://github.com/rfordatascience/tidytuesday
https://www.data-is-plural.com/ A newsletter dedicated to useful/curious data sets (over 400 editions of the newsletter are available)

You can also check with your instructor to see if they might have some data files you might be able to use.

Keep in mind that if any team submits a project that looks too similar to work done by someone else it would be an academic integrity violation.

Topic Ideas

How have popular songs changed over the last 5-6 decades? For instance, beats per minute, genres, number of unique words, etc.
How have food prices changed in the past two decades? For instance, the average cost of vegetables, fruit, and meat, and the year-to-year inflation rates.
How have the key economic factors changed over the last 5-6 decades? For instance, the unemployment rate, S&P 500 index, CPI, median salary, and Housing price index.
How has the eSports industry changed over the last decades? For instance, the number of tournaments, the number of people viewing the game, the total prize money, and the total estimated economic income (direct and indirect).
Do parodies of songs or the originals use more unique words? Does the genre of the original song appear to have an effect?
Investigate some part(s) of daily life that you can compare and contrast between your teenage years and the time period when your parents were teenagers (e.g., weather, economy, politics).

As a team you can discuss prompts such as the following to help you come up with ideas:

What shared interests does your team have?
Do you all like a particular sport?
Do you like a particular type of music?
Do you like to play video games?

If your team is completely struck, talk with your instructor.

Resources and Hints

The PCIP System
The Data Wrangling Shiny App
The ASU Image Accessiblity AI Tool
The Quarto Crash Course Guide
PSU Libraries Zotero Guide
Adopt a coding style guide for your team to follow throughout your files. Here are a few options:
Make sure to use the {lintr} and lint your files to ensure that stylistic error get flagged and addressed.
- The defaults of the lint function are for the Tidyverse style.
- The provided .lintr file in this template is configured for BOAST/Tidyverse.
Draw on the Contributor Role Taxonomy to report who did what.

What to Include in Your Exploratory Data Analysis

Channel the beliefs and dispositions of EDA in your work here. That is,

We construct our understandings of the data by investigating broad questions of "What is going on here?" and "What do we have?"
Data visualizations (tables, plots, graphs) play a central and vital role in making sense of data and the surrounding context.
Model building/hypothesis generation is an iterative process.
Use methods that are robust, resistant, smooth, and have breadth.
Maintain dispositions of skepticism, flexibility, and statistical ecumenism for methods used.

The EDA portion of work is what many people mess up: either because they skip it, don't do a good job, or don't spend the time on it that they should. However, this portion of data analysis is vital for you to truly be able to communicate later types of analysis in coherent and valid ways.

For this project, we recommend that you really focus on exploring the data and propose hypotheses/models for future analysis work.

Caution: Moving Beyond EDA

A common way that Stat 184 students introduce unnecessary problems into their course project is attempting to move beyond EDA and use tools/methods from beyond the course. For example, using different forms of regression, ANOVA, machine learning algorithms (supervised and unsupervised), etc. We do not recommend that you attempt to use these, even if you have seen them in another course. It has been our collective experience over the years that students who attempt to incorporate these techniques into this project 1) never do so correctly/coherently, and 2) end up focusing on those method instead of of the EDA.

Teams will not receive any points/benefit for using methods from beyond Stat 184, however teams can lose points if such methods are used incorrectly, poorly, and/or are explained in less than ideal ways.

Frequently Asked Questions (FAQs)

I want to work alone. Can I do this project by myself?
- No. Statistics and Data Science are deeply collaborative fields and you need to learn how to work effectively in a team setting. As such this project was designed with collaboration and teamwork as a core component.
Can we use a past project from another course for this project?
- No. Submitted work that you already received a grade in another course is a violation of the University's Academic Integrity policy (i.e., self-plagiarism).
Can we use a project from another (current) course for this project?
- No. There is not enough time for the faculty to meet and negotiate how one project will fully satisfy the requirements for the separate courses.
One of the team members works in a research lab. Can we use data from that lab for our project?
- YES! Provided you have permission from the lab's Principle Investigator (PI).
Can we use the data from another course's project?
- Maybe. Your team will need to disclose this as part of your data provenance and check with your instructor early on in the process. While you may use the data, you may not use the same analysis as what was used in the other course's project. Be prepared to show the other course's project upon instructor request.
Can I use Python (or another language)?
- Not as the majority language. STAT 184 is an R programming course, and the project is intended to evaluate learning objectives of this course so you should mostly be using R and your entire analysis must be self-contained in a single QMD. However, if you want to do something in the project that we have not learned about in class (using R) and prefer to use Python or some other language for that purpose it's fine to include some Python chunks in your QMD file. Try to get your usage of non-R programming languages under 20% (total).

Learning Objectives and Outcomes Assessed

This project is meant to provide one the last opportunities for each student to demonstrate their growth and development. In particular, this project provides us with data on the following learning objectives and outcomes. We will be looking at several items including the body of your rendered output file, the code appendix of your rendered output file, and your GitHub repo (linked in a comment on your submission in Canvas).

Work-in-Progress Presentations

Data Analysis: Students will develop their skills in using statistical software to engage in data analysis.
- DA.5: The student will learn to create data visualizations that support data analysis.
- DA.3: The student will learn to describe the components of data visualizations.
Communication: Students will develop their communication skills as related to statistical programming and data analysis.
- Comm.3: The student will learn to meaningfully discuss data visualizations (e.g., plots, tables) to support others in their learning about the current context.
Reproducibility: Students will develop their skills in creating reproducible code and data analyses.
- Repro.6: The student will learn to create reproducible analysis reports.
Computational Thinking: Students will develop ways of thinking that make use of R’s computing power to solve problems.
- CT.1: The student will learn to use statistical software (R) to solve problems.
Professionalism: Students will develop their professional identity through self-reflection and working with others.
- Prof.3: The student will demonstrate that they can work/collaborate effectively with others.

Report and Repo

Computational Thinking: Students will develop ways of thinking that make use of R’s computing power to solve problems.
- CT.1: The student will learn to use statistical software (R) to solve problems.
- CT.2: The student will learn to apply the PCIP System to their work.
- CT.4: The student will learn to check code for effectiveness and issues.
Programming: Students will develop their skills in programming (creating code) with statistical software.
- Prog. 3: The student will learn to apply the core programming principles to their work.
- Prog. 4: The student will learn to write code that works for achieving their goals.
- Prog. 5: The student will learn to organize their code to assist with the code’s readability.
- Prog. 6: The student will learn to implement a coding style for their code.
Data Analysis: Students will develop their skills in using statistical software to engage in data analysis.
- DA.1: The student will learn to import files into R from a variety of sources and file types.
- DA.2: The student will learn to wrangle (clean, prepare) data for further analysis.
- DA.5: The student will learn to create data visualizations that support data analysis.
- DA.6: The student will learn to generate the values of descriptive statistics that support data analysis.
- DA.7: The student will learn to identify both what makes up a case and the case attributes for different situations.
Reproducibility: Students will develop their skills in creating reproducible code and data analyses.
- Repro.2: The student will learn to apply the principles of Open Science to their work.
- Repro.3: The student will learn to assess whether data collections meet the FAIR/CARE principles.
- Repro.4: The student will learn to utilize version control tools as a means of creating reproducible and collaborative work.
- Repro.5: The student will learn to create reproducible and reusable code.
- Repro.6: The student will learn to create reproducible analysis reports.
Communication: Students will develop their communication skills as related to statistical programming and data analysis.
- Comm.1. The student will learn to generate documentation for their code that not only they, but others can use to help make sense of the code.
- Comm.3: The student will learn to meaningfully discuss data visualizations (e.g., plots, tables) to support others in their learning about the current context.
Professionalism: Students will develop their professional identity through self-reflection and working with others.
- Prof.2: The student will learn to attribute coding and data analysis work to different paradigms/perspectives such as Exploratory vs. Confirmatory Data Analysis, Tidyverse vs. Base R, etc.

Self- & Peer-Evaluations and Peer Feedback

Professionalism: Students will develop their professional identity through self-reflection and working with others.
- Prof.3: The student will demonstrate that they can work/collaborate effectively with others.
- Prof.5: The student will demonstrate that they can keep their (professional) commitments through actions such as attending class (on time), engaging with the class, and submitting work in a timely fashion.

Final Comment

If at any point in time you have questions, uncertainties, doubts, etc. or if you run into problems, errors, mysterious bugs, etc. talk with your instructor. We are here to help and will function as your Technical Supervisor/Mentor to your Probationary Data Scientist status. Make use of our knowledge, wisdom, and skills!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Context and Purpose

Key Dates

Project Guidelines

Data Requirements

Work-in-Progress Presentations

Peer Feedback

Additional Guidance and Hints

Data and Topics

Data Repositories

Topic Ideas

Resources and Hints

What to Include in Your Exploratory Data Analysis

Caution: Moving Beyond EDA

Frequently Asked Questions (FAQs)

Learning Objectives and Outcomes Assessed

Work-in-Progress Presentations

Report and Repo

Self- & Peer-Evaluations and Peer Feedback

Final Comment

FilesExpand file tree

Project_Guidelines.md

Latest commit

History

Project_Guidelines.md

File metadata and controls

Project Context and Purpose

Key Dates

Project Guidelines

Data Requirements

Work-in-Progress Presentations

Peer Feedback

Additional Guidance and Hints

Data and Topics

Data Repositories

Topic Ideas

Resources and Hints

What to Include in Your Exploratory Data Analysis

Caution: Moving Beyond EDA

Frequently Asked Questions (FAQs)

Learning Objectives and Outcomes Assessed

Work-in-Progress Presentations

Report and Repo

Self- & Peer-Evaluations and Peer Feedback

Final Comment