The Summer STEM Institute (SSI) bootcamp teaches students how to conduct interdisciplinary data science research projects. Data science is becoming increasingly important to uncovering discoveries and significant patterns in a range of fields, from biology to environmental science to computer security. Students in the bootcamp are equipped with the skills to apply data science to their own scientific fields of interest.

The bootcamp consists of two intensive courses: Conducting Data Science Research and Programming for Data Science. Through the Conducting Data Science Research course, students learn to develop research questions, formulate project ideas, and create research proposals. Through the Programming for Data Science course, students are taught the programming and machine learning skills needed to implement research projects.

Conducting Data Science Research

Learn to design, conduct, and present a data science research project

The Conducting Data Science Research course teaches students how to design and conduct interdisciplinary data science research projects from start to finish. Students learn how to choose an initial field of interest, how to identify and interpret relevant background readings and literature, and how to find, augment, and scrape workable datasets. Students are taught how to ask relevant research questions and how to decide what modern statistical and machine learning tools can be applied to develop and support answers to these questions. Students are also taught techniques in exploratory data analysis and inferential statistics that can be used to answer research questions. Finally, students are taught how to write academic research papers, create research posters, and deliver scientific presentations. Conducting research at a young age can be especially difficult, and the course is designed specifically to break down the research process for high school students. Lessons in the course will incorporate experiences from the instructing team’s past experiences entering science fair competitions and conducting research from a young age.

Dhruvik Parikh

Course Instructor

Dhruvik is an undergraduate at Stanford University studying economics and computer science. In high school, Dhruvik won the Young Scientist Award (top 3 overall; $50,000 scholarship) at the International Science and Engineering Fair (ISEF). He placed second nationally at the National Junior Science and Humanities Symposium (JSHS) and has also been named a Forbes 30 Under 30 recipient. For his research, he was invited to be a speaker at a TEDx conference. At Stanford, Dhruvik has conducted machine learning research at Stanford's Sustainability and Artificial Intelligence Laboratory. He is also the co-director of the Stanford Cleantech Challenge and is the Vice President of Projects for Stanford Energy Club. Dhruvk has worked as a software engineer at Voya Sol, Microsoft, and Kalshi. Previously, Dhruvik has conducted chemical engineering research at the MIT Hamel Lab and computational biology research at the University of Washington.

Franklyn Wang

Assistant Instructor

Franklyn is an undergraduate and teaching assistant for graduate-level probability courses at Harvard University. Franklyn was the mentor of the student who placed 3rd in the nation at the 2019 Regeneron Science Talent Search (STS). In high school, Franklyn was named a Davidson Fellow (top 12 nationally; $25,000 scholarship), a Regeneron Science Talent Search (STS) Finalist (top 40 nationally; $25,000 scholarship), and the 2nd Place National Winner in the Siemens Competition in Math, Science, and Technology ($50,000 scholarship).  In addition, Franklyn won top 5 nationally in the USA Computing Olympiad (USACO), top 20 nationally in the USA Math Olympiad (USAMO), and top 50 nationally in the USA Physics Olympiad (USAPhO). He was named a Goldwater Scholar, the most prestigious fellowship in the natural sciences, mathematics, and engineering. He is also the primary author of a paper published in Operations Research Letters.

Anne Lee

Assistant Instructor

Anne is an undergraduate studying computer science at Stanford University. In high school, Anne was invited to attend the Research Science Institute (RSI), a highly selective research fellowship hosted by MIT. At RSI, Anne conducted research at MIT's Computational Materials Design Lab and was the only student in her RSI class to receive top 5 awards for both her final paper and final presentation. In addition, Anne was named a Semifinalist in the Siemens Competition in Math, Science, and Technology and received First Place at the United Nations Sustainable Development Contest. She was also recognized as a #include fellow by she++, an organization for women in technology at Stanford University. In college, Anne was recognized as the top project in Stanford's CS109 (Probability for Computer Scientists) out of 100+ project submissions. She also received the energyCatalyst grant from the Stanford Tom Kat Center for Sustainable Energy to pursue computer vision research. 

Programming for Data Science

Learn the programming skills to conduct interdisciplinary data science research projects

The Programming for Data Science course equips students with the technical programming skills to conduct data science research. Students learn the fundamentals of Python programming as well as the different parts of data engineering, including data cleaning, data manipulation, and data visualization. Students are also taught methods from the modern machine learning toolbox and how they can be used to answer questions with datasets. Students learn how to design machine learning models for supervised settings and how to apply techniques such as multiple linear regression, logistic regression, and decision trees to these settings. Furthermore, students are also taught how to work with data in unsupervised settings and techniques on clustering and dimensionality reduction of data. In addition to Python and Jupyter Notebook, students gain practice with the modern lexicon of data science libraries such as numpy, pandas, matplotlib, and sklearn. The course is designed to be accessible to students with no prior programming experience.

Erich Liang

Course Instructor

Erich is an undergraduate at Caltech studying computer science and mathematics. Previously, Erich worked at Microsoft Research, where he published his research in IEEE and presented at the International Conference on Computational Science and Computational Intelligence (CSCI). Erich has also worked on machine learning at Bloomberg, data science at Versium and HVF Labs, and AR/VR at Facebook. At Caltech, Erich has worked on multiple research projects in the Caltech Rigorous Systems Research Group (RSRG) and the Caltech Computational Vision Laboratory. He is currently working in Dr. Katie Bouman’s research group and is working on improving blackhole video reconstruction algorithms. Additionally, Erich is the Chairman of the Board of the Caltech Data Science Club, and he has also received the “Best Deep Learning Hack'' award at TreeHacks, Stanford University’s annual hackathon.

Aleks Jovčić

Assistant Instructor

Aleks is an undergraduate at the University of Washington studying computer science. Aleks has always been passionate about teaching and computer science education, and he has worked in the past in numerous computer science teaching roles. At the University of Washington, Aleks has worked as the head teaching assistant for CSE 312: Probability & Statistics for Computer Scientists, a course on discrete probability, randomness, and computer science theory. He has also worked as a teaching assistant for CSE 163: Intermediate Data Programming, a course on data programming and the ecosystem of publicly available data science tools and libraries. In his free time, Aleks enjoys filmmaking, running, and game development. Previously, Aleks has worked as a game designer at Skyglow Games alongside a team of other game developers, designers, and artists to ship video game products.

Amy Jin

Assistant Instructor

Amy is an undergraduate at Harvard University studying computer science. In high school, Amy conducted computer vision research at the Stanford Artificial Intelligence Laboratory (SAIL). For her work, Amy was awarded the Davidson Fellowship (top 4 nationally; $50,000 scholarship), was a Regeneron Science Talent Search (STS) Scholar, and was a Semifinalist in the Siemens Competition in Math, Science, and Technology. Amy presented her research in the IEEE Winter Conference on Applications of Computer Vision (WACV) and won Best Paper in the Machine Learning for Health Workshop in the Conference on Neural Information Processing Systems (NeurIPS). She attended the International Science and Engineering Fair (ISEF), where she received the First Geno Award and the Second Award in the Robotics and Intelligent Machines category. Amy has also worked as a software engineer at Expedia and Facebook.

Alex Tsun

Guest Lecturer

Alex is currently a machine learning and relevance engineer at LinkedIn. Alex has been a lecturer at the Paul G. Allen School of Computer Science & Engineering at the University of Washington, where he redesigned CSE 312: Probability & Statistics for Computer Scientists. To improve the course, Alex developed a new textbook, presentations, problem sets, auto graders, and lectures for the course. Previously, Alex has served as a teaching assistant and course assistant a total of 13 times at Stanford University and the University of Washington, where he received the Bob Bandes Memorial Student Teaching Award (awarded to <1% of TA’s each year). Alex has also worked as a data scientist at Facebook, a software engineer at Google, and a research assistant at the Graphics and Imaging Lab and Washington Experimental Mathematics Laboratory at the University of Washington. In addition to guest lecturing at SSI this summer, Alex will be teaching computer science at Stanford University.

Bootcamp Structure


Lectures for Conducting Data Science Research and Programming for Data Science Research take place daily from Monday through Friday. During lectures, students will be expected to participate in code-alongs to practice their programming skills.


Students are assigned weekly homework assignments that are a combination of programming assignments and research deliverables. Programming deliverables provide students hands-on experience practicing programming with datasets. Research deliverables guide students through the process of conducting background research, formulating a research proposal, and developing their scientific writing and presentation skills.

Discussion Section

In addition to daily lectures, discussion sections take place on Monday through Thursday. Discussion sections review key concepts and questions from the bootcamp and also dive deeper into more advanced topics. Discussion sections are optional, but students are highly recommended to attend to review concepts and practice their skills.

Office Hours

Teaching staff and research mentors host office hours throughout the week at two time slots on Monday through Friday. Students can attend office hours to ask questions they have about the course material, receive help on homework assignments and debugging code, and receive assistance with any other questions they may have.

Discussion Board

In addition to office hours, students can reach out to course instructors and teaching assistants through a 24/7 virtual discussion board to ask questions about lectures and homework assignments, obtain feedback on work, and receive help debugging code.

Starter Dataset Wiki

One of the most challenging aspects of conducting data science research for students is identifying appropriate datasets to work with. To help students get started with research projects, SSI provides students with a starter dataset wiki that contains a compilation of cleaned and pre-processed datasets across a variety of topics and categories, including public health, medical diagnostics, computer development, internet traffic and security, politics, climate change, energy, and other diverse topics.

Research Workshops

Every Sunday of the program, research workshops on a diverse variety of topics are hosted throughout the day. Students are encouraged to look through the list of workshop topics at the start of every week of the program to decide which talks would be most interesting for them to attend. Research workshops are led by SSI research mentors and are designed to introduce students to different fields of research and shed insight on various aspects of the research process. Examples of previous workshops span topics including "The Academic Publication Process", "AI & Medicine: Obstacles and Prospects", and "Introduction to Reinforcement Learning."

Post-SSI Research Tutorials

For two weeks after the end of the program, students will have access to post-program research tutorials to help them get started with conducting research in a variety of fields. During this two-week period, students can work on research tutorials in a self-paced manner, and staff will remain active on the virtual discussion board to help students who have any questions along the way.