Syllabus
Welcome to Data Science Programming II! In this course, we will learn object-oriented programming to create tree and graph data structures to represent hierarchical data and implement algorithms for efficiently searching these structures.
We'll often create our own datasets, using techniques like logging, benchmarking, web scraping, and A/B testing.
In the last third of the semester we'll explore some basic machine learning techniques, including regression, classification, clustering, and decomposition.
Additions To Syllabus Made During Semester
- none yet
Course Instructor
- Dr. Gurmail Singh (Teaching Faculty - Department of Computer Sciences) gurmail.singh@wisc.edu
Lectures (Meeting Time and Location)
- Online (via Zoom) MTWR 11:00 a.m. - 12:15 p.m.
Lecture recordings will be provided, but is subject to change based on the attendance. Online synchronous attendance is expected. Attendance will be recorded via TopHat (or other tool) or through Zoom logs. If attendance is healthy and it feels like people are keeping up, I'll usually be posting recordings. If the attendance drops, I will stop posting recordings (warning will be issued one lecture prior to this change). Additionally, if there will be any technical issues, lecture recordings may not be provided.
Instructional Modality
- LEC001: online (via Zoom)
Communication
We message the class regularly via @wisc.edu email and/or Canvas announcements. We recommend updating your Canvas settings so that the "Announcement" option is "Notify immediately" so that you don't miss something important. Also, you are supposed to check your @wisc.edu email regularly.
See the help page for details about how to contact us.
We have various forms for us to leave (optionally anonymous) feedback, report grading issue and exam conflict, and thank TAs/mentors.
Grading
Grading breakdown
- 49% - 7 projects (7% each)- no score drops
- 11% - exam 1 online (with Honorlock)
- 11% - exam 2 online (with Honorlock)
- 11% - exam 3 online (with Honorlock)
- 7% - 7 online quizzes (1% each)- one lowest score drops
- 6% - lab attendance - 3 lowest score drops
- 4% - lecture attendance (20% score drops)
- 1%- class surveys
Letter Grades
At the end of the semester, we will assign final grades based on these thresholds:
- 93% - 100%: A
- 88% - 92.99%: AB
- 80% - 87.99%: B
- 75% - 79.99%: BC
- 70% - 74.99%: C
- 60% - 69.99% D
We will not be rounding off scores at the end of the semester.
Graded Component Details
Lecture attendance
I will take attendance using TopHat (or other tool).
However, if I feel that there are sufficient number of students present in the lecture, I may not take attendance on that day to save time, and every student will be considered present on that day.
Lab attendance
We'll post a lab activities document on the dates mentioned on the schedule page. Labs will be conducted online via Zoom, link will be available on Canvas. You can work through it individually, or with your assigned study group. TAs and peer mentors will be available online (via Zoom) to answer questions. If you have extra time at the lab after completing the lab document, you can work on projects with your assigned study group.
To obtain the point for a lab, you need to submit screenshots of the work (code and/or running results) you have done so far to Canvas within ten minutes after the lab ends. You don't have to finish every lab activity, but sufficient (as determined by the Lab TA) working progress is needed.
Projects
Submission: Everybody will individually upload either a .py or a .ipynb or a zip (as specified) file(s) for each project.
Collaboration: Even though everybody will make their individual submission, every project will have (1) a group part to be optionally done with your assigned study group and (2) an individual part. For the group part, any form of help from anybody in your group is allowed; I recommend you find times for everybody on the group to work at the same time so you can help each other through coding difficulties in this part. You're also welcome to do the "group" part individually, or with a subset of your assigned study group. For the individual part, you may only receive help from course staff (instructors/TAs/peer mentors); you may not discuss this part with anybody else (in the class or otherwise) or get help from them.
Late Policy:
- Students have a bank of 12 late days for the semester.
- For a given project, you may use 3 late days without any deduction. After that, 5% deduction per late day, for the next 2 days. Projects which are late by more than 5 days will not be accepted.
- After the bank runs out, 5% deduction will be applied per late day.
- You may not be allowed to use any late days on the last project.
- Late days only apply to projects. They do not apply to Quizzes.
- Late days are automatically applied and do not need to be requested.
- Late days are calculated as whole days. That is, even if your project is late by 2 hours, that counts as 1 whole late day.
- For calculating late days, we will always consider your last possible submission (prior to manual code review). We will not be accepting requests to grade a prior submission for the same project.
Code Review: TAs will give you comments on specific parts of your assignment. This feedback process is called a "code review", and is a common requirement in industry before a programmer is allowed to add their code changes to the main codebase. TAs will also include reasons for deductions in the comments. Read your code reviews carefully; even if you receive 100% on your work, we'll often give you tips to save effort in the future.
Project Grading: Grades will be largely based on automatic tests that we run. We'll share the tests with you before the due date, so you should rarely be too surprised by your grade. Though it shouldn't be common, we may deduct points for serious hardcoding, not following directions, or other issues. Some bugs (called non-deterministic bugs) don't show up every time code is run -- if you have such an issues, we may give you a different grade based on the tester than what you were expecting based on when you ran it. Finally, our tests aren't very good at evaluating whether plots and other visualizations look how they should (a human usually needs to evaluate that). Note: to get your project graded, you must do the following:
- submit the project on GitLab, a link is provided on the Labs and Projects page,
- make a merge request for the project,
- make sure the output of pipeline gives you grades.
That is, to get a project graded, you must submit your project as mentioned on the Labs and Projects page and make a merge request for the project. Additionally, you must make sure that the pipline is pass and it is giving you grades. You should be able to see the grades given to you by the autograder.
Auto-grader: The autograder will be run soon (usually within a minute) after your project submission. The autograder usually grades the projects within 1 minutes. However, it may take around 2 minutes. If any project takes more then 10 mintues to get autograded, then that project will given a zero grade. You must check the output of the pipline to see the grades given to you by the autograder. If the autograder does not give you any grade then that means you are given a zero grade. Note that the manual grade deductions will be on top of deductions made by the autograder. We expect you to try submitting your project early and make sure nothing crashes. However, this should not be a substitute for running tester.py or grader.py locally.
- Clearing the auto-grader is a mandatory part of the project submission process. Regular project deadlines will be applicable for autograder failures as well. That is, your project submission must clear auto-grader within the hard deadline for a project. If not, we are unable to grade your project submission.
- If your project fails auto-grader, it will be your responsibility to utilize office hours and make an appropriate resubmission. The resubmission will also be counted towards late day usage.
Allowed Packages: anything that comes pre-installed with Python and any packages used during the lectures and listed in the projects are allowed. Using unapproved packages may result in a score of zero when submitted for grading because the autograder won't be able to run your code without those packages.
Quizzes
There will be 7 quizzes due on dates as specified on the schedule page. Make sure you know the rules regarding what is allowed and what is not. Each quiz will be allowed to be taken twice with unlimited time (in given number of days), but the quiz score will be the average score of both the attempts.
Allowed
- however much time you need during the specified days
- discussing answers with members of your assigned study group who are taking the quiz at the same time
- referencing texts, notes, or provided course materials
- searching online for general information
- running code
NOT allowed
- taking it more than twice
- discussing answers with anybody outside of your group
- discussing with members of your group who have already completed the quiz when you haven't completed it yourself yet
- posting anything online about the quizzes
- using such material potentially posted by other 320 students who broke the preceding rule
Midterms and Final
These will be multiple-choice exams taken Canvas (online exams) with Honorlock during the time specified on schedule page.
- Midterm exam1: Wednesday, July 03th during 2:30 pm-7:15 pm.
- Midterm exam2: Thurday, July 18th during your lecture time.
- Final exam:
- Regular exam: Monday, August 05th 7:15 am - 9:15 am.
- McBurney exam: Monday, August 05th 7:15 am - 11:15 am.
- Alternate exam: Monday, August 05th 9:15 am - 11:15 am.
Note 2: Probably less than the scheduled time will be used for the final exam.
Note 3: If you are a student with McBurney visa (affiliated with the McBurney Center) and you have an accommodation that allows you to use more than 1.5 x time, then please fill out the exam conflict form and you will be allowed to take a midterm exam in the evening (or any other time after the regular exam) of the midterm exam day.
Readings
We'll sometimes assign readings from the following sources (all free):
- Think Python 2nd Edition by Allen B. Downey: Read Online
- Automate the Boring Stuff with Python by Al Sweigart: Read Online
- Principles and Techniques of Data Science by Sam Lau, Joey Gonzalez, and Deb Nolan: Read Online
- Scipy Lecture Notes by many contributors: Read Online
Cheating
Yeah, of course you shouldn't cheat, but what is cheating? The most common form of academic misconduct in these classes involves copying/sharing code for programming projects. Here's an overview of what you can and cannot do:
Acceptable
- any collaboration with your assigned study group members on the group part of a project
- using ChatGPT to ask simple questions. For example: how do I use "self" inside a class constructor?
- copying code examples from online examples that is NOT specific to your project (if project solutions are leaked online, you may not use that). If you copy code, you must cite it in your code with a comment (think of it like citing a quote in a essay -- without the citation, you're plagarizing).
Here're some code citing template:
- # copied/adapted from ... (website name) ... (link to the post) ...
e.g., # copied/adapted from Stackoverflow: https://stackoverflow.com/questions/24101524/finding-median-of-list-in-pythonLinks to an external site. (For detail, see the Citing Code section below.)
- # copied/adapted from ... (Large Language Models name) ... (prompt used) ...
e.g., # copied/adapted from GPT4: "write a Python function to find the median of a list."
- # copied/adapted from ... (website name) ... (link to the post) ...
NOT Acceptable
- using ChatGPT to solve project questions in entirety
- getting project help of any kind for the group part from anybody who is not either (a) on your assigned study group or (b) 320 staff
- getting project help of any kind for the individual part from anybody who is not 320 staff
- using part or all of project solutions found online
- breaking any of the rules listed under the "Quizzes" section
- reporting lab attendance for yourself or someone else who didn't actually attend (dropping in for a few minutes is not "attending")
- counting lab attendance as merely showing up without spending substantial time on the assigned lab activities
- using TopHat while not actually physically present in the room (since we sometimes use this for attendance)
- helping somebody else cheat
Citing Code: you can copy small snippets of code from stackoverflow (and other online references) if you cite them. For example, suppose I need to write some code that gets the median number from a list of numbers. I might search for "how to get the median of a list in python" and find a solution at https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python.
I could (legitimately) post code from that page in my code, as long as it has a comment as follows:
# copied/adapted from https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python def median(lst): sortedLst = sorted(lst) lstLen = len(lst) index = (lstLen - 1) // 2 if (lstLen % 2): return sortedLst[index] else: return (sortedLst[index] + sortedLst[index + 1])/2.0
In contrast, copying from a nearly complete project (that accomplishes what you're trying to do for your project) is not OK. When in doubt, ask us! The best way to stay out of trouble is to be completely transparent about what you're doing.
Similarity Detection: We will use automated tools to look for similarities across submissions. We take cheating detection seriously to make the course fair to students who put in the honest effort.