Syllabus

Welcome to Data Science Programming II! In this course, we will learn object-oriented programming to create tree and graph data structures to represent hierarchical data and implement algorithms for efficiently searching these structures.

We'll often create our own datasets, using techniques like logging, benchmarking, web scraping, and A/B testing.

In the last third of the semester we'll explore some basic machine learning techniques, including regression, classification, clustering, and decomposition.

Additions To Syllabus Made During Semester

Course Instructor

Lectures (Meeting Time and Location)

Lecture recordings will be provided, but is subject to change based on in-person attendance. In-person attendance is expected. Attendance will be recorded via TopHat (or other tool). Also, on paper attendance may be taken on random days. If attendance is healthy and it feels like people are keeping up, I'll usually be posting recordings. If the attendance drops, I will stop posting recordings (warning will be issued one lecture prior to this change).

Instructional Modality

Communication

We message the class regularly via @wisc.edu email and/or Canvas announcements. We recommend updating your Canvas settings so that the "Announcement" option is "Notify immediately" so that you don't miss something important. Also, you are supposed to check your @wisc.edu email regularly.

See the help page for details about how to contact us.

We have various forms for us to leave (optionally anonymous) feedback, report grading issue and exam conflict, and thank TAs/mentors.

Grading

Grading breakdown

Letter Grades

At the end of the semester, we will assign final grades based on these thresholds:

We will NOT be rounding off scores at the end of the semester.

Graded Component Details

Lecture attendance

I will take attendance using TopHat (or other tool). However, if I feel that there are sufficient number of students present in the lecture, I may not take attendance on that day to save time, and every student will be considered present on that day.

Lab attendance

We'll post a weekly lab activities document. You can work through it individually, or with your assigned study group. TAs and peer mentors will walk around to answer questions and check your progress in finishing the lab activities. If you have extra time at the lab after completing the lab document, you can work on projects with your assigned study group.

To obtain the point for a lab, you need to submit screenshots of the work (code and/or running results) you have done so far to Canvas within five minutes after the lab ends. You don't have to finish every lab activity, but sufficient (as determined by the Lab TA) working progress is needed.

Projects

Submission: Everybody will individually upload either a .py or a .ipynb or a zip (as specified) file for each project with the submission tool.

Collaboration: Even though everybody will make their individual submission, every project will have (1) a group part to be optionally done with your assigned study group and (2) an individual part. For the group part, any form of help from anybody in your group is allowed; I recommend you find times for everybody on the group to work at the same time so you can help each other through coding difficulties in this part. You're also welcome to do the "group" part individually, or with a subset of your assigned study group. For the individual part, you may only receive help from course staff (instructors/TAs/peer mentors); you may not discuss this part with anybody else (in the class or otherwise) or get help from them.

Late Policy:

Code Review: TAs will give you comments on specific parts of your assignment. This feedback process is called a "code review", and is a common requirement in industry before a programmer is allowed to add their code changes to the main codebase. TAs will also include reasons for deductions in the comments. Read your code reviews carefully; even if you receive 100% on your work, we'll often give you tips to save effort in the future.

Project Grading: Grades will be largely based on automatic tests that we run. We'll share the tests with you before the due date, so you should rarely be too surprised by your grade. Though it shouldn't be common, we may deduct points for serious hardcoding, not following directions, or other issues. Some bugs (called non-deterministic bugs) don't show up every time code is run -- if you have such an issues, we may give you a different grade based on the tester than what you were expecting based on when you ran it. Finally, our tests aren't very good at evaluating whether plots and other visualizations look how they should (a human usually needs to evaluate that).

Auto-grader: The autograder will be run periodically during 2 days days prior to a project deadline (from Tuesday night if the deadline is on Thursday and so on). Because of this, we expect you to try submitting your project early and make sure nothing crashes. However, this should not be a substitute for running tester.py locally. You should only try submitting once you pass the tests locally.

Allowed Packages: anything that comes pre-installed with Python and any packages used during the lectures and listed in the projects are allowed. Using unapproved packages may result in a score of zero when submitted for grading because the autograder won't be able to run your code without those packages.

Quizzes

There will be a short Canvas quiz due at the end of most Tuesdays. Make sure you know the rules regarding what is allowed and what is not. Eacn quiz will be allowed to be taken twice with unlimited time (in given number of days) but the quiz score will be the average score of both the attempts.

Allowed
NOT allowed

Midterms and Final

These will be multiple-choice exams taken Canvas (online exams) with Honorlock.

Readings

We'll sometimes assign readings from the following sources (all free):

Cheating

Yeah, of course you shouldn't cheat, but what is cheating? The most common form of academic misconduct in these classes involves copying/sharing code for programming projects. Here's an overview of what you can and cannot do:

Acceptable

NOT Acceptable

Citing Code: you can copy small snippets of code from stackoverflow (and other online references) if you cite them. For example, suppose I need to write some code that gets the median number from a list of numbers. I might search for "how to get the median of a list in python" and find a solution at https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python.

I could (legitimately) post code from that page in my code, as long as it has a comment as follows:

    # copied/adapted from https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python
    def median(lst):
      sortedLst = sorted(lst)
      lstLen = len(lst)
      index = (lstLen - 1) // 2

      if (lstLen % 2):
        return sortedLst[index]
      else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0
  

In contrast, copying from a nearly complete project (that accomplishes what you're trying to do for your project) is not OK. When in doubt, ask us! The best way to stay out of trouble is to be completely transparent about what you're doing.

Similarity Detection: We will use automated tools to look for similarities across submissions. We take cheating detection seriously to make the course fair to students who put in the honest effort.