The core of SSI is a 6-week summer Bootcamp that provides students with the foundations to conduct data science research. The Bootcamp will be approximately 20 hours of total work per week. There will be daily lectures for each course in addition to weekly homework assignments. In addition, students will have access to teaching assistants in review sessions as well as 24/7 discussion boards to ask questions.

*How to design and conduct a data science project*

This course is designed to teach high school students how to complete data science research projects from beginning to end. Students will learn how to define a problem in a field of interest, identify relevant datasets, and formulate a literature-supported research proposal. In addition, students will be taught the modern algorithmic and visualization toolbox for data science projects. Finally, students will be taught how to write publishable research papers, create research posters, and deliver scientific presentations. There will be 5 lectures a week and weekly homework assignments. The teaching team for the course is excited to be designing an innovative curriculum based on their personal experiences with research and high school science fairs and competitions. Course homework assignments will incorporate data science project resources being developed by Eric Zhang, a 2x gold medalist at the International Olympiad in Informatics (IOI), and Ben Choi, a 2x National Science Bowl winner.

*Course Instructor*

Franklyn is a student and teaching assistant for graduate-level probability courses at Harvard. Franklyn was the mentor of the student who placed 3rd in the nation at the 2019 Science Talent Search (STS). In high school, Franklyn was named a Davidson Fellow (top 12 nationally; $25,000 scholarship), Regeneron Science Talent Search (STS) Finalist (top 40 nationally; $25,000 scholarship), and the Siemens 2nd Place National Winner ($50,000 scholarship). In addition, Franklyn was top 5 nationally in the USA Computing Olympiad, top 20 nationally in the USA Math Olympiad, and top 50 nationally in the USA Physics Olympiad.

*Assistant Instructor*

Anne is a student studying computer science and sustainability at Stanford. In high school, Anne was invited to attend the Research Science Institute (RSI), a highly selective high school research fellowship hosted by MIT (top 80 of 1600). At RSI, Anne conducted research at MIT's Computational Materials Design Lab and was the only student in her RSI class to receive top awards for both paper and presentation.

**Course Outline**

*Week 1: Defining a Research Problem*

**1.1: **What is Data Science Research, Choosing a Field, Starting With the Question and Not the Data**1.2: **Case Studies on Asking Research Questions

1.3:

1.4:

1.5:

*Week 2: Identifying Datasets and Writing a Research Proposal*

**2.1:** Finding Data for a Problem, Surveying Data, Properties of Good and Bad Data, Creating Our Own Data, Cleaning Data, What To Do When We Can’t Find Good Data**2.2: **Case Studies on Finding Datasets

2.3:

2.4:

2.5:

*Week 3: Exploratory Data Analysis and Types of Data Science Research*

**3.1: **Exploratory Data Analysis**3.2: **Case Studies on Exploratory Data Analysis

3.3:

3.4:

3.5:

*Week 4: After the Basics: Advancing your Project*

**4.1: **Correlation Does Not Imply Causation: The Art of Causal Inference**4.2: **Statistics Crash Course: Tests of Significance

4.3:

4.4:

4.5:

*Week 5: Writing a Research Paper*

**5.1: **Writing a Research Paper (Part I - Introduction, Purpose, Methodology)**5.2: **Case Studies on Research Papers (Part I)

5.3:

5.4:

5.5:

*Week 6: Research Poster and Presentation*

**6.1: **Science** **Research Posters**6.2: **Case Studies on Science Research Posters

6.3:

6.4:

6.5:

*Teaching the programming skills to conduct interdisciplinary data science research projects*

Data science is revolutionizing most sectors of science and technology. In the future, a strong understanding of computer science and data will play an increasingly critical role to making significant breakthroughs in both research and technology. This course will teach students computer programming for data science. The focus of the course will be very applied and geared towards teaching the practical skills for conducting interdisciplinary data science research projects. Most course topics will be taught through examples with real research datasets. There will be 5 lectures a week and weekly homework assignments.

*Course Instructor*

Alex is a Stanford Course Assistant and seasoned instructor who has served as a teaching assistant for "Probability for Computer Scientists" a total of 13 times at UW Seattle and Stanford. He is currently pursuing an M.S. in Computer Science and specializing in AI and theoretical computer science (GPA: 4.06/4.00). As an undergraduate, Alex completed a triple major in computer science, statistics, and mathematics. Alex has worked in the past as a machine learning researcher at LinkedIn, a data scientist at Facebook, and a software engineer at Google.

*Assistant Instructor*

Adam is a former Stanford Course Assistant who has held previous jobs in AI research at NVIDIA and in data science at Point72. While an undergraduate at Stanford, Adam studied data science and AI and graduated at the top of his class (GPA: 4.07/4.00; top 8% academic GPA in engineering school). Adam completed his degree at Stanford in 2.5 years. While an undergraduate, he was a course assistant in a graduate-level cryptography course. Adam has done well in some of Stanford's most rigorous graduate-level computer science courses, including Cryptography (top 5%), Machine Learning (top 5%), Convolutional Neural Networks for Visual Recognition (top 10%), and Mining Massive Datasets (top 10%).

**Course Outline**

*Week 1: Introduction to Python*

**1.1: **Introduction to Python, Variables **1.2: **For Loops and Nested Loops

1.3:

1.4:

1.5:

*Week 2: Advanced Python*

**2.1: **Lists, List Comprehensions, and Sorting**2.2: **Sets and Dictionaries

2.3:

2.4:

2.5:

*Week 3: Data Wrangling and Visualization*

**3.1: **Mathematical Typesetting with LaTeX**3.2: **Data Types (Categorical, Continuous), Data Formats (CSV, JSON), and Reading and Writing Data in Python

3.3:

3.4:

3.5:

*Week 4: Machine Learning: Unsupervised Learning*

**4.1: **Introduction to Probability**4.2: **Introduction to Machine Learning, Motivation, Tasks, Demo

4.3:

4.4:

4.5:

*Week 5: Machine Learning: Supervised Learning*

**5.1: **Dataset Case Study: Regression Tasks, Linear Regression**5.2: **Dataset Case Study: Linear Regression with Regularization, Polynomial Regression

5.3:

5.4:

5.5:

*Week 6: Advanced Topics and Next Steps*

**6.1: **Dataset Case Study: Decision Trees, Ensemble Methods**6.2: **Dataset Case Study: Neural Networks and Deep Learning

6.3:

6.4:

6.5:

Course instructors and teaching assistants will hold review sessions throughout the week. Students can attend review sessions to ask teaching staff questions relating to lecture or homework assignments. Every week, there will also be review sessions to go over programming and research homework assignments and common questions students have.

In addition to review sessions, students can reach out to course instructors and teaching assistants through Piazza, a virtual discussion board. Students can ask questions on discussion boards at any time, and a member of the teaching team will respond as soon as possible. Through discussion boards, students can quickly get feedback, ask questions about homework, and receive assistance debugging code.