[Folder 2 - Lesson 7 - notebook-covidspread] fix incorrect indexing#724
[Folder 2 - Lesson 7 - notebook-covidspread] fix incorrect indexing#724wangamber8-collab wants to merge 9 commits intomicrosoft:mainfrom
Conversation
… [2:] to [3:] to drop the correct columns and dropped the province column to ensure the function works properly
There was a problem hiding this comment.
Pull request overview
This pull request fixes incorrect data slicing in a COVID-19 spread analysis notebook that was causing non-date metadata to appear in plots. The fix adjusts the slice indexing from [2:] to [3:] to properly exclude Province/State, Latitude, and Longitude columns, and adds Province/State to the list of columns to be dropped.
Changes:
- Updated data slicing from [2:] to [3:] to exclude three metadata columns instead of two
- Added 'Province/State' column to the drop list alongside 'Lat' and 'Long' for all three dataframes
- Removed trailing whitespace in the mkframe function
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@leestott Hi! I’m a freshman and fairly new to contributing to open-source projects, so I really appreciate the feedback you left earlier. I’ve gone through and addressed all the comments and made the requested changes. When you have a chance, could you take another look and let me know if this is ready to be merged, or if there’s anything else I should update? Thanks so much for your time! |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
2-Working-With-Data/07-python/notebook-covidspread.ipynb:1611
- This note claims the first three elements are non-date metadata including
Province/State, but after the earliergroupby('Country/Region').sum()the example output shows onlyLat/Longas non-date columns. Update the note to match the actual post-groupby dataframe shape (or adjust preprocessing soProvince/Stateis still present if that’s intended).
"> **Note** how we use `[3:]` to remove the first three elements of a sequence that contain non-date metadata (the Province/State and geolocation columns). We can also drop those three columns altogether:"
2-Working-With-Data/07-python/notebook-covidspread.ipynb:1622
infected.groupby('Country/Region').sum()earlier produces a dataframe that (per the showninfected.head()output) does not include aProvince/Statecolumn. Dropping['Lat','Long','Province/State']will therefore raise aKeyErroronProvince/Statein the current notebook. Either dropProvince/Statebefore the groupby step, or adderrors='ignore'/ conditional logic so this cell is robust across dataset versions.
"infected.drop(columns=['Lat','Long','Province/State'],inplace=True)\n",
"recovered.drop(columns=['Lat','Long','Province/State'],inplace=True)\n",
"deaths.drop(columns=['Lat','Long','Province/State'],inplace=True)"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "infected.loc['US'][3:].plot()\n", | ||
| "recovered.loc['US'][3:].plot()\n", | ||
| "plt.show()" |
There was a problem hiding this comment.
infected/recovered/deaths are grouped by Country/Region earlier; the displayed infected.head() after that step shows columns starting with Lat, Long, then date columns (no Province/State). With that shape, slicing with [3:] will skip the first date column (e.g., 1/22/20). Consider keeping [2:] (to drop only Lat/Long) or explicitly dropping the non-date columns before plotting and then plotting the remaining date-indexed series.
This issue also appears in the following locations of the same file:
- line 1611
- line 1620
There was a problem hiding this comment.


… entries
Update the slicing of the series from [2:] to [3:] because indexes 0-2 contains non-date metadata (province, latitude, longitude). The label "longitude" is included in the dataframe and appears in the plot. Additionally, drop the column "province/state" column before the function is called. This is because "province/state" contains non-date data that will cause an error when the function is called.