Bootcamp

The Summer STEM Institute (SSI) bootcamp teaches students how to conduct interdisciplinary data science research projects. Data science is becoming increasingly important to uncovering discoveries and significant patterns in a range of fields from biology to environmental science to computer security. Students in the bootcamp will be equipped with the skills to apply data science to their personal scientific fields of interest.

‍The bootcamp consists of two intensive courses: Conducting Data Science Research and Programming for Data Science. Through the Conducting Data Science Research course, students learn how to develop research questions, formulate project ideas, and create research proposals. Through the Programming for Data Science course, students are taught the technical skills to implement research projects.

Conducting Data Science Research

Learn to design, conduct, and present a data science research project

The Conducting Data Science Research course teaches students how to design and conduct interdisciplinary data science research projects from start to finish. Students will learn how to choose an initial field of interest, identify and interpret relevant background readings and literature, and find, augment, and scrape workable datasets. Students will be taught how to ask relevant research questions and decide what modern statistical and machine learning tools can be applied to develop and support an answer to these questions. Finally, students will be taught how to write academic research papers, create research posters, and deliver scientific presentations. Conducting research at a young age can be difficult, and the course is designed specifically to break down the research process for high school students. Lessons in the course will incorporate experiences from the instructing team’s past experiences entering science fair competitions and conducting research from a young age.

Dhruvik Parikh

Course Instructor

Dhruvik is an undergraduate at Stanford University studying economics and computer science. In high school, Dhruvik won the Young Scientist Award (top 3 overall; $50,000 scholarship) at the International Science and Engineering Fair (ISEF). He placed second nationally at the National Junior Science and Humanities Symposium (JSHS) and has also been named a Forbes 30 Under 30 recipient. For his research, he was invited to be a speaker at a TEDx conference. At Stanford, Dhruvik has conducted machine learning research at Stanford's Sustainability and Artificial Intelligence Laboratory. He is also the co-director of the Stanford Cleantech Challenge and is the Vice President of Projects for Stanford Energy Club. Dhruvk has worked as a software engineer at Voya Sol, Microsoft, and Kalshi. Previously, Dhruvik has conducted chemical engineering research at the MIT Hamel Lab and computational biology research at the University of Washington.

Franklyn Wang

Assistant Instructor

Franklyn is an undergraduate and teaching assistant for graduate-level probability courses at Harvard University. Franklyn was the mentor of the student who placed 3rd in the nation at the 2019 Regeneron Science Talent Search (STS). In high school, Franklyn was named a Davidson Fellow (top 12 nationally; $25,000 scholarship), a Regeneron Science Talent Search (STS) Finalist (top 40 nationally; $25,000 scholarship), and the 2nd Place National Winner in the Siemens Competition in Math, Science, and Technology ($50,000 scholarship).  In addition, Franklyn won top 5 nationally in the USA Computing Olympiad (USACO), top 20 nationally in the USA Math Olympiad (USAMO), and top 50 nationally in the USA Physics Olympiad (USAPhO). He was named a Goldwater Scholar, the most prestigious fellowship in the natural sciences, mathematics, and engineering. He is also the primary author of a paper published in Operations Research Letters.

Anne Lee

Assistant Instructor

Anne is an undergraduate studying computer science at Stanford University. In high school, Anne was invited to attend the Research Science Institute (RSI), a highly selective research fellowship hosted by MIT. At RSI, Anne conducted research at MIT's Computational Materials Design Lab and was the only student in her RSI class to receive top 5 awards for both her final paper and final presentation. In addition, Anne was named a Semifinalist in the Siemens Competition in Math, Science, and Technology and received First Place at the United Nations Sustainable Development Contest. She was also recognized as a #include fellow by she++, an organization for women in technology at Stanford University. In college, Anne was recognized as the top project in Stanford's CS109 (Probability for Computer Scientists) out of 100+ project submissions. She also received the energyCatalyst grant from the Stanford Tom Kat Center for Sustainable Energy to pursue computer vision research. 

Programming for Data Science

Learn the programming skills to conduct interdisciplinary data science research projects

The Programming for Data Science course empowers students with the technical programming skills to conduct data science research. Students will learn the fundamentals of data cleaning, manipulation, and visualization. Students will be taught the essentials of statistical analysis, including how to design and apply exploratory data analysis, confidence intervals, and hypothesis testing. Students will also learn tools in the modern machine learning toolbox in both supervised and unsupervised settings including decision trees, clustering algorithms, and dimensionality reduction. The course is designed to be accessible to students with no programming background, and students will be taught Python fundamentals, data science libraries (including numpy, pandas, matplotlib, sklearn), and LaTeX typesetting.

Alex Tsun

Course Instructor

Alex is currently a machine learning and relevance engineer at LinkedIn. Alex has been a lecturer at the Paul G. Allen School of Computer Science & Engineering at the University of Washington, where he redesigned CSE 312: Probability & Statistics for Computer Scientists. To improve the course, Alex developed a new textbook, presentations, problem sets, auto graders, and lectures for the course. Previously, Alex has served as a teaching assistant and course assistant a total of 13 times at Stanford University and the University of Washington, where he received the Bob Bandes Memorial Student Teaching Award (awarded to <1% of TA’s each year). Alex has also worked as a data scientist at Facebook, a software engineer at Google, and a research assistant at the Graphics and Imaging Lab and Washington Experimental Mathematics Laboratory at the University of Washington. 

Aleks Jovčić

Assistant Instructor

Aleks is an undergraduate at the University of Washington studying computer science. Aleks has always been passionate about teaching and computer science education, and he has worked in the past in numerous computer science teaching roles. At the University of Washington, Aleks has worked as the head teaching assistant for CSE 312: Probability & Statistics for Computer Scientists, a course on discrete probability, randomness, and computer science theory. He has also worked as a teaching assistant for CSE 163: Intermediate Data Programming, a course on data programming and the ecosystem of publicly available data science tools and libraries. In his free time, Aleks enjoys filmmaking, running, and game development. Previously, Aleks has worked as a game designer at Skyglow Games alongside a team of other game developers, designers, and artists to ship video game products.

Amy Jin

Assistant Instructor

Amy is an undergraduate at Harvard University studying computer science. In high school, Amy conducted computer vision research at the Stanford Artificial Intelligence Laboratory (SAIL). For her work, Amy was awarded the Davidson Fellowship (top 4 nationally; $50,000 scholarship), was a Regeneron Science Talent Search (STS) Scholar, and was a Semifinalist in the Siemens Competition in Math, Science, and Technology. Amy presented her research in the IEEE Winter Conference on Applications of Computer Vision (WACV) and won Best Paper in the Machine Learning for Health Workshop in the Conference on Neural Information Processing Systems (NeurIPS). She attended the International Science and Engineering Fair (ISEF), where she received the First Geno Award and the Second Award in the Robotics and Intelligent Machines category. Amy has also worked as a software engineer at Expedia and Facebook.

Bootcamp Structure

Lectures

Lectures for Conducting Data Science Research and Programming for Data Science Research are each one-hour in length and will take place daily from Monday through Friday.

Assignments

Students will be assigned weekly homework assignments that are a combination of programming assignments and research deliverables. Programming deliverables will give students hands-on experience practicing programming with datasets. Research deliverables will guide students through the process of conducting background research, formulating a research proposal, and developing their scientific presentation and communication skills.

Discussion Section

In addition to daily lectures, discussion sections will take place on Monday through Thursday. Discussion sections will review key concepts and questions from the bootcamp and also dive deeper into more advanced topics. Discussion sections are optional, but students are highly recommended to attend to review concepts and practice their skills.

Office Hours

Teaching staff and research mentors will host office hours throughout the week. Students can attend office hours to get help on any questions they may have.

Discussion Board

In addition to office hours, students can reach out to course instructors and teaching assistants through a 24/7 virtual discussion board where they can ask questions about lectures and homework assignments, receive feedback, and receive assistance debugging code.

Research Wikis

Students will have access to a research wiki consisting of datasets and starting resources to help students get started with research projects. The research wiki contains resources across a variety of topics and categories, including public health, biology, physics, technology, social science, aerospace, sustainability, and economics.

Research Workshops

Every Sunday of the program, students may attend research workshops hosted throughout the day. Research workshops are led by SSI research mentors, and each workshop will introduce students to a new field of research or teach students about some aspect of the research process such as the publication process or how to choose a research lab. Workshops are designed to help students explore research questions and develop project ideas in a variety of fields.

Post-SSI Research Tutorials

For two weeks after the end of the program, students have access to post-program research tutorials to help them get started with conducting research in a variety of fields. During this two-week period, students can work on research tutorials in a self-paced manner, and staff will remain active on the virtual discussion board to help students who have any questions.