Thank you for 18 years of DVDs, Netflix

27 minute read

Published: April 22, 2023

Soon, Netflix will be canceling its DVD-by-mail program, the original service that helped Netflix crush Blockbuster and got us used to watching movies on-demand from the comfort of our homes before streaming was a thing. Perhaps not coincidentally, my dad cancelled my family’s subscription to the DVD service this winter. As my brother wisely put it upon hearing my dad’s news, “Netflix can finally stop buying physical DVDs now that their last customer cancelled!”

Of course, when I saw that Netflix had kept my parents’ entire DVD history I knew I had to look at the data. According to the history, my family signed up for Netflix in 2004 - that’s almost 20 years of DVDs! For most of that time, we were on a plan that let us have 3 DVDs concurrently. At some point while I was in high school, I was given full control over one of the 3 DVDs on our plan. Looking at the DVD history, though, this must have been before Netflix even had the concept of separate accounts - my Netflix account on my family’s plan only shows ~10 DVD rentals, but I distinctly remember years of freedom to discover indie movies and curl up in our game room watching movies on my own in high school. It was such a treat to have my own stream of movies that I had full control over. In fact, I still really miss watching indie movies and discovering other excellent movies from their trailers - Netflix’s algorithm really hasn’t figured me out as well as those trailers had.

Getting the data

Anyway, onto the data. When you log in to dvd.netflix.com (a separate website from netlix.com, lol), the history is very simply shown in a table.

I didn’t see any easy way to just export this table, and I didn’t really want to try too hard to find a legit way to scrape the site (especially since I figured there’d be complex auth to get around), so I started with the good ol’ “inspect page” method. (Actually, I started with copy-paste but that didn’t work.) Turns out the information was easily accessible in the html page itself, so I went ahead and just downloaded the html pages for my mom and dad’s account histories. My parents started each using their own account to rent DVD’s, once the concept of accounts was implemented, and my dad’s account had about 500 entries on it so I figured it might have different movies on it.

With a combination of BeautifulSoup’s documentation, poking around via Chrome’s inspect tool, and good ol’ ctrl-F, I was able to pretty easily figure out how to extract all the information I needed.

from bs4 import BeautifulSoup
import pandas as pd

import calmap

import matplotlib.pyplot as plt

def extract_one_movie_info(m):
    """m is a BeautifulSoup object with one movie's row of info"""
    position = m.find('div', 'position').text

    title = m.find('a', 'title').string

    # year, rating, and duration
    meta = m.find('p', 'metadata')
    year, movie_rating, duration = [x.text for x in meta.find_all('span')]

    # get the ratings
    user_rating = m.find('span',  attrs={'data-userrating': True}).attrs['data-userrating']
    avg_rating = m.find('span',  attrs={'data-userrating': True}).attrs['data-rating']

    ship_date = m.find('div', 'shipped').text
    return_date = m.find('div', 'returned').text

    return (
        position,
        {
            'title': title,
            'year_or_season': year,
            'movie_rating': movie_rating,
            'disc_or_duration': duration,
            'user_rating': user_rating,
            'avg_rating': avg_rating,
            'ship_date': ship_date,
            'return_date': return_date
        }
    )

def extract_movie_dict(soup):
    """soup is the parsed html containing the full DVD history.
    Returns a dict where the key is the position of the movie (i.e. row number)
    """

    # There should only be two tags with these, one with the full table of info and
    # one with some placeholder code that I'm assuming populates the front-end somehow
    hist = soup.find_all('div', id='historyList')[0]

    # Get all of the movie elements, they're in a <li> </li> blocks woohoo!
    movies = hist.find_all('li', id=True)

    movie_dict = dict([extract_one_movie_info(m) for m in movies])

    return movie_dict

with open('DVD Netflix-Alain.html', 'r') as f1:
    soup1 = BeautifulSoup(f1, 'html.parser')

dad_dict = extract_movie_dict(soup1)

with open('DVD Netflix-Nadine.html', 'r') as f2:
    soup2 = BeautifulSoup(f2, 'html.parser')

mom_dict = extract_movie_dict(soup2)

assert len(dad_dict) == 508
assert len(mom_dict) == 1598

# | operator is new in python 3.9: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict
movie_dict = {'mom_' + k: v for k,v in mom_dict.items()} | {'dad_' + k: v for k,v in dad_dict.items()}

# make sure we didn't drop any keys
assert len(movie_dict) == len(dad_dict) + len(mom_dict)

df = pd.DataFrame(movie_dict).T
df.head()

	title	year_or_season	movie_rating	disc_or_duration	avg_rating	ship_date	return_date
mom_1	Daughters of the Dust	1991	NR	1h 53m	3	12/05/22	Returned 12/29/22
mom_2	I Vitelloni	1953	NR	1h 43m	4	11/23/22	Returned 12/05/22
mom_3	Mutiny on the Bounty	1935	NR	2h 12m	3.9	11/18/22	Returned 11/23/22
mom_4	The Pervert's Guide to Ideology	2012	NR	2h 16m	3.7	10/04/22	Returned 11/17/22
mom_5	Diabolically Yours / The Widow Couderc	1967	NR	3h 2m	3.3	09/27/22	Returned 10/04/22

df['user_rating'] = df['user_rating'].astype(float)
df['avg_rating'] = df['avg_rating'].astype(float)

# Some more parsing to get the dates right
df['ship_date'] = pd.to_datetime(df['ship_date'], format='%m/%d/%y')
df['return_date'] = pd.to_datetime(df['return_date'].str.split(' ').str[1], format='%m/%d/%y')

df.head()

	title	year_or_season	movie_rating	disc_or_duration	avg_rating	ship_date	return_date
mom_1	Daughters of the Dust	1991	NR	1h 53m	3.0	2022-12-05	2022-12-29
mom_2	I Vitelloni	1953	NR	1h 43m	4.0	2022-11-23	2022-12-05
mom_3	Mutiny on the Bounty	1935	NR	2h 12m	3.9	2022-11-18	2022-11-23
mom_4	The Pervert's Guide to Ideology	2012	NR	2h 16m	3.7	2022-10-04	2022-11-17
mom_5	Diabolically Yours / The Widow Couderc	1967	NR	3h 2m	3.3	2022-09-27	2022-10-04

How many movies did we rent? (But first: a lot of data cleaning)

First, let’s get some summary statistics about how many movies we rented, and if any of those rentals were of movies we had already rented.

But before I can do that, I need to make sure that all the movies in my dataset are unique.

# Are all movies unique?
df.groupby('title').filter(lambda x: len(x['title']) > 1).sort_values(by='title').head(20)

	title	year_or_season	movie_rating	disc_or_duration	user_rating	avg_rating	ship_date	return_date
dad_461	1 Giant Leap	2002	NR	2h 35m	2.0	3.1	2007-10-19	2007-10-30
mom_1296	1 Giant Leap	2002	NR	2h 35m	0.0	3.7	2007-10-19	2007-10-30
dad_463	10 mph	2007	NR	1h 32m	3.0	3.5	2007-10-02	2007-10-10
mom_1300	10 mph	2007	NR	1h 32m	0.0	2.8	2007-10-02	2007-10-10
dad_361	127 Hours	2010	R	1h 34m	3.0	4.2	2011-07-06	2011-07-19
mom_974	127 Hours	2010	R	1h 34m	0.0	3.8	2011-07-06	2011-07-19
dad_68	1917	2019	R	1h 59m	0.0	4.7	2020-09-22	2020-10-07
mom_179	1917	2019	R	1h 59m	0.0	4.3	2020-09-22	2020-10-07
dad_221	45 Years	2015	R	1h 35m	5.0	3.8	2016-07-18	2016-08-02
mom_571	45 Years	2015	R	1h 35m	0.0	3.4	2016-07-18	2016-08-02
mom_148	A Bad Moms Christmas	2017	R	1h 44m	0.0	2.9	2021-02-09	2021-02-16
dad_56	A Bad Moms Christmas	2017	R	1h 44m	0.0	3.7	2021-02-09	2021-02-16
mom_888	A Better Life	2011	PG-13	1h 37m	0.0	3.8	2012-07-17	2012-07-31
dad_325	A Better Life	2011	PG-13	1h 37m	4.0	4.4	2012-07-17	2012-07-31
dad_422	A Collection of 2007 Academy Award Nominated S...	2007	NR	3h 15m	0.0	3.9	2009-09-01	2009-09-18
mom_1133	A Collection of 2007 Academy Award Nominated S...	2007	NR	3h 15m	0.0	3.5	2009-09-01	2009-09-18
mom_293	A French Village: The Complete Collection	Collection All	NR	Disc 9	0.0	4.6	2019-08-29	2019-09-09
mom_292	A French Village: The Complete Collection	Collection All	NR	Disc 10	0.0	4.6	2019-09-04	2019-09-10
mom_296	A French Village: The Complete Collection	Collection All	NR	Disc 7	0.0	4.6	2019-08-19	2019-08-29
mom_291	A French Village: The Complete Collection	Collection All	NR	Disc 12	0.0	4.6	2019-09-10	2019-09-17

Answer: definitely not.

Looks ike there are two ways for a movie to be duplicated: (1) it’s part of a TV series collection, or (2) it shows up on both mom and dad’s histories. For #2, I initially thought it would only be when they both rated the movie differently, but you can see above that there’s one example (“A Bad Moms Christmas”) where neither of them rated it, but for some reason they have difffferent average ratings. It’s very possible that my assumption of what the “data-rating” field means is wrong (maybe it’s the average rating for the type of user they’ve categorized each acount in?), or perhaps there’s something about when the data was last updated (though that would be strange because I downloaded both of these histories on the same day).

Anyway, let’s keep going and worry about figuring out the ratings later, if at all. For now, it looks like what I care about as a unique rental is a unique combination of title and “duration” column. Did we ever rent the same movie twice?

df.groupby(['title', 'disc_or_duration']).filter(
    lambda x: len(x['ship_date'].unique()) > 1
).sort_values(by=['title', 'disc_or_duration']).head(20)

	title	year_or_season	movie_rating	disc_or_duration	user_rating	avg_rating	ship_date	return_date
mom_667	A Most Wanted Man	2014	R	2h 1m	0.0	3.9	2015-05-05	2015-05-12
mom_693	A Most Wanted Man	2014	R	2h 1m	0.0	3.9	2015-01-21	2015-02-02
dad_262	A Most Wanted Man	2014	R	2h 1m	4.0	4.2	2015-01-21	2015-02-02
mom_797	Anchorman: The Legend of Ron Burgundy	2004	UR	1h 34m	2.0	2.4	2013-08-20	2013-08-24
mom_1382	Anchorman: The Legend of Ron Burgundy	2004	UR	1h 34m	2.0	2.4	2006-11-27	2006-12-05
dad_297	Anchorman: The Legend of Ron Burgundy	2004	UR	1h 34m	2.0	2.9	2013-08-20	2013-08-24
dad_491	Anchorman: The Legend of Ron Burgundy	2004	UR	1h 34m	2.0	2.9	2006-11-27	2006-12-05
mom_175	Awakenings	1990	PG-13	2h 0m	0.0	4.1	2020-10-06	2020-10-15
mom_642	Awakenings	1990	PG-13	2h 0m	0.0	4.1	2015-08-24	2015-09-02
mom_795	Bad Education	2004	NC-17	1h 46m	4.0	4.1	2013-08-24	2013-08-31
mom_1229	Bad Education	2004	NC-17	1h 46m	4.0	4.1	2008-06-20	2008-07-15
dad_296	Bad Education	2004	NC-17	1h 46m	0.0	4.1	2013-08-24	2013-08-31
mom_312	Before Sunset	2004	R	1h 20m	2.0	3.8	2019-04-24	2019-05-13
mom_1371	Before Sunset	2004	R	1h 20m	2.0	3.8	2007-01-08	2007-01-22
mom_1476	Before Sunset	2004	R	1h 20m	2.0	3.8	2005-10-25	2005-11-09
dad_126	Before Sunset	2004	R	1h 20m	4.0	3.4	2019-04-24	2019-05-13
mom_258	Big Little Lies	Season 2	TV-MA	Disc 1	0.0	4.4	2020-01-24	2020-02-11
mom_429	Big Little Lies	Season 1	TV-MA	Disc 1	0.0	4.4	2018-01-23	2018-02-05
dad_109	Big Little Lies	Season 2	TV-MA	Disc 1	5.0	5.0	2020-01-24	2020-02-11
dad_167	Big Little Lies	Season 1	TV-MA	Disc 1	5.0	5.0	2018-01-23	2018-02-05

Ok, so yes there are definitely movies that were rented multiple times! Interestingly, some of these are duplicated between both mom and dad’s accounts but some aren’t. I’m assuming this has to do with how Netflix handled the transition between not having accounts, having distinct accounts in the same family plan, and perhaps also how the DVD’s were shared across accounts vs. assigned to individual accounts.

I don’t really care about those intricacies, since I’m just taking this data as the holistic family plan. Let’s get rid of these fake duplicates, and consider coming back later to do some “mom vs. dad” analyses.

Actually, before we move on let’s check that the two lists are actually unique…

[k for k in dad_dict if k not in mom_dict]

[]

D’oh! The history I downloaded from my dad’s account is a subset of the history in my mom’s. So I could have simplified this whole thing by just looking at her history export, though I suppose that would have removed any of the mom vs. dad rating discrepancies. Anyway, I don’t care about those so let’s get on with the plan.

print(f'With duplicates: {df.shape}, without: {df.drop_duplicates().shape}')
df = df.drop_duplicates()

With duplicates: (2106, 8), without: (2090, 8)

# Ok now how many movies did we rent twice?
print(f"Total rentals = {df[['title', 'year_or_season', 'disc_or_duration', 'ship_date']].drop_duplicates().shape[0]}")

# There must be a way to use transform instead of all these reset_index...
(df[['title', 'year_or_season', 'disc_or_duration', 'ship_date']]
 .drop_duplicates()
 .groupby(['title', 'year_or_season', 'disc_or_duration'])
 .size()
 .reset_index(name='n_rentals')
 ['n_rentals'].value_counts()
).reset_index().sort_values('n_rentals')

Total rentals = 1596

	n_rentals	count
0	1	1441
1	2	76
2	3	1

In total, my parents made a total of 1596 unique rentals. Of these, 1441 were for movies they rented only once. They rented 76 movies twice, and one movie three times. Let’s see what the lucky movie was!

(df[['title', 'year_or_season', 'disc_or_duration', 'ship_date']]
 .drop_duplicates()
 .groupby(['title', 'year_or_season', 'disc_or_duration'])
 .filter(lambda x: len(x['title']) >= 3)
)

	title	year_or_season	disc_or_duration	ship_date
mom_312	Before Sunset	2004	1h 20m	2019-04-24
mom_1371	Before Sunset	2004	1h 20m	2007-01-08
mom_1476	Before Sunset	2004	1h 20m	2005-10-25

Looks like it’s Before Sunset, which makes sense - it came out in 2004, and was probably a movie that my parents and I both rented separately while I was in high school, and then that I guess my parents re-watched in 2019.

Let’s see how big of a gap there was between the two rentals for the movies we rented twice.

double_rentals = (df[['title', 'year_or_season', 'disc_or_duration', 'ship_date']]
 .drop_duplicates()
 .groupby(['title', 'year_or_season', 'disc_or_duration'])
 .filter(lambda x: len(x['title']) == 2)
)

delta_rental = (
    double_rentals.groupby('title').apply(
        lambda x: x['ship_date'].max() - x['ship_date'].min()
    ).reset_index(name='time_between_rentals')
)

delta_rental['days_between_rentals'] = delta_rental['time_between_rentals'].apply(lambda x: x.days)
delta_rental['years_between_rentals'] = delta_rental['days_between_rentals'] / 365.

ax = delta_rental['years_between_rentals'].plot(kind='hist')
ax.set_title('Gaps between renting the same movie twice')
ax.set_xlabel('Years between rentals')
ax.set_ylabel('Number of movies')

Text(0, 0.5, 'Number of movies')

png

delta_rental.sort_values(by='years_between_rentals', ascending=False)[['title', 'time_between_rentals', 'years_between_rentals']]

	title	time_between_rentals	years_between_rentals
66	Wasabi	4891 days	13.400000
23	Heat	4850 days	13.287671
65	War Dance	4672 days	12.800000
50	The Lady Vanishes	4613 days	12.638356
38	Raise the Red Lantern	3828 days	10.487671
...	...	...	...
55	The Miseducation of Cameron Post	15 days	0.041096
24	Homeland	14 days	0.038356
34	Once Upon a Time in Hollywood	4 days	0.010959
45	The Curious Case of Benjamin Button	1 days	0.002740
60	The Soloist	1 days	0.002740

69 rows × 3 columns

df.query('title == "The Curious Case of Benjamin Button"')

	title	year_or_season	movie_rating	disc_or_duration	user_rating	avg_rating	ship_date	return_date
mom_1155	The Curious Case of Benjamin Button	2008	PG-13	2h 46m	4.0	4.0	2009-06-16	2009-06-23
mom_1156	The Curious Case of Benjamin Button	2008	PG-13	2h 46m	4.0	4.0	2009-06-15	2009-06-19
dad_428	The Curious Case of Benjamin Button	2008	PG-13	2h 46m	5.0	4.1	2009-06-16	2009-06-23

There are a handful of movies that we rented twice ten years apart, and some that we rented one day apart! The ones that we rented one day apart might be a fluke, or perhaps they were movies that we had on the waitlist with both of our accounts and didn’t coordinate to not duplicate them. That makes sense.

Anyway, now that I understand the data much better, let’s dig in some more to all the non-rating-related information.

df = df[['title', 'year_or_season', 'disc_or_duration', 'ship_date', 'return_date']].drop_duplicates()

# first and last movie rental?
# how many movies per month?
# average duration of keeping movies

(df['return_date'].max() - df['ship_date'].min()).days / 365.

18.284931506849315

Wow, we were signed up for Netflix’s DVD service for over 18 years! That’s pretty amazing, and probably outlasts every commitment my parents made apart from maybe their longest jobs and homes and, oh right, their kids.

1596 rentals over 18 years is a little over 88 movies per year, which is about 1.5 movies per week for 18 years. Considering that we were subscribed to a plan with 3 DVD’s for the majority of that time, that’s a pretty impressive utilization rate.

Let’s see if we can visualize this data nicely. I’ll use the return date as a proxy for when the movie was watched, since we were usually pretty prompt about returning the movies after watching them.

Side note for the folks interested in data wrangling: I realized that I have 1596 unique rentals when considering the ship date, but 1598 unique rows in the data when including the return date for this analysis. It looks like there are two movies with the same ship date but different return dates; I’m assuming that’s a bug in Netflix’s data, unless my parents and I both rented the same DVD on the exact same day and returned them both exactly three days apart. Given that I’ve still never seen The Big Sick and don’t know what The Harvey Girls is, I’m betting on dirty data.

df.groupby(
    ['title', 'year_or_season', 'disc_or_duration', 'ship_date']
).size().reset_index(name='size').sort_values(by='size')

	title	year_or_season	disc_or_duration	ship_date	size
0	1 Giant Leap	2002	2h 35m	2007-10-19	1
1067	Summer Hours	2008	1h 43m	2015-09-02	1
1066	Suits	Season 1	Disc 1	2012-05-22	1
1065	Suffragette	2015	1h 46m	2016-12-06	1
1064	Stranger than Paradise	1984	1h 29m	2010-09-27	1
...	...	...	...	...	...
525	God Grew Tired of Us	2006	1h 29m	2011-08-23	1
534	GoodFellas	1990	2h 25m	2007-03-12	1
1595	Zero Dark Thirty	2012	2h 37m	2013-04-26	1
1140	The Big Sick	2017	1h 59m	2017-10-24	2
1244	The Harvey Girls	1946	1h 41m	2021-12-13	2

1596 rows × 5 columns

df.query('title == "The Big Sick"')

	title	year_or_season	disc_or_duration	ship_date	return_date
mom_445	The Big Sick	2017	1h 59m	2017-10-24	2017-10-30
mom_446	The Big Sick	2017	1h 59m	2017-10-24	2017-10-27

df.query('title == "The Harvey Girls"')

	title	year_or_season	disc_or_duration	ship_date	return_date
mom_68	The Harvey Girls	1946	1h 41m	2021-12-13	2021-12-20
mom_69	The Harvey Girls	1946	1h 41m	2021-12-13	2021-12-17

# Remove the extra rows
df = df.drop(['mom_69', 'mom_446'])

Ok, back to your regularly programmed visualization. I did a quick google and stumbled across a library called calmap which seems to make Github-style calendars super easily. Heck yes, let’s give it a try!

18 years of rentals

fig, ax = calmap.calendarplot(
    data=df.groupby('return_date').size(),
    vmin=0,
    ncols=3,
    fig_kws={'figsize': (15, 10)},
    yearlabel_kws={'fontsize': 14, 'color': 'gray'}
)

png

Daily rental patterns

First off, a quick guide to reading this sort of plot (which, despite staring at so many on Github, I’ve never really understood super well). Each plot shows 365 boxes, where each box is a day. Each row is a day of the week and each column is a week. The boxes are colored by how many DVDs were returned on that day (darker means more DVDs). So if you see a row with lots of filled-in boxes like a horizontal line, that means that we returned DVDs on the same day of the week across multiple weeks. Let’s say it’s the second row from the top, that would mean that Tuesdays are a frequent return day. Seeing a column of filled-in boxes would mean that we returned DVDs every day on a given week.

Ok, now that we’re oriented we can start to pick out some patterns. The first sort of pattern that sticks out to me is about which days we returned the DVDs. First, it looks like we rarely returned movies on two different days per week - you can see that there are very few columns with two filled-in boxes. Second, we never return DVDs on Saturday or Sunday (there are no filled-in boxes in the bottom two rows). It would make sense for Netflix’s DVD receiving department to be closed on weekends, so that checks out. Finally, it looks like our most frequent return day is Tuesday – that makes perfect sense! My parents tend to watch movies over the weekend, which means they would get picked up by USPS on Monday and received by Netflix the following Tuesday.

Another observation is that we would sometimes go months without returning any DVDs - you can see this as areas where there are multiple columns in a row of empty boxes. My guess is that these likely correspond to periods when my parents were on vacation, out of town, or otherwise busy. You can see examples of these gaps in June and July of a handful of years, which is what tipped me off to this vacation hypothesis. But there’s a lot of gaps, so I don’t think I’ll ask them to corroborate this hypothesis with the most recent dates of their big RV trips.

Return day consistency seems informative

Finally, the consistency of which days of the week we returned DVDs is intriguing - there are some years where it’s really consistent (the filled-in boxes are all on the same two-ish rows) and others where it’s not. From just looking at the plot, it seems that 2005-2009 didn’t have super consistent return days of the week - that makes sense, this was the period where I lived at home and had my own dedicated DVD (before the days of password sharing, this is how we shared accounts!). My guess is that I either watched movies on weekdays sometimes or, more likely, was less prompt at returning them after I watched them, which would explain the variety of return days.

The following 5 years, 2020-2015, had a much more consistent return day pattern, with most returned on Mondays or Tuesdays. This also makes sense, as this was the period where my parents were both empty nesters but still working: during this time, they would have been more likely to watch movies on the weekend than during the week, thus mailing them back on Mondays and Netflix receiving them on Tuesdays.

Then 2016 and 2017 are less consistent again - my guess is that this is right around when my mom retired. When I was talking to my parents about this analysis, my mom mentioned that when she first retired she watched a lot of TV series on Netflix DVDs. I also know that it took my parents a while to start paying for all the streaming services, so it would make sense that in these first few years after retirement you see a lot less consistency in the return days, as my mom was likely burning through TV shows via Netflix’s DVD service!

2018 and onwards gets decently consistent again. My hypothesis here is that 2018 is around when my parents started paying for and using streaming services, so they stopped watching as many movies and TV shows on Netflix DVDs. That would leave Netflix DVDs only for the more obscure foreign films or recently released movies not yet available on streaming that they wanted to watch, and everything else would have been watched via streaming. In this scenario, it makes sense that the behavior would revert to a consistent early-week return date: my dad was still working, and so I assume that they watched the movies that they ordered from Netflix on the weekends, and my mom watched other things during the week via streaming services.

Finally, 2021 and 2022 are slightly less consisetnt and much more sparse than any of the other years. My dad retired in December 2021, which is when my parents started taking a lot of trips in their RV. But I don’t think that’s what explains the sparseness - my guess is that they switched their plan from 3 DVDs to 1 sometime in 2021, which led to the slow death of their usage of the service.

Bring in the parents: putting my hypotheses to the test

I texted my parents to see if I could confirm some of these hypotheses. First off, my mom retired in December 2015 – huzzah, I was right! Pretty cool that you can see her retirement just in the distribution of return days of the week.

Then, my dad told me that they had access to Netflix’s streaming as soon as it started in 2007, but my mom doesn’t think they started using it regularly until around 2015. My parents also got Apple TV Box in November 2020, which made streaming very easy across the various services. So that doesn’t check out with my “2018 is when they started streaming regularly” hypothesis - something else must have happened in 2018 that got them back to a more consistent DVD viewing pattern. Maybe my mom ran out of TV shows that Netflix had on DVD?

Finally, they switched to the plan with only 1 DVD in August 2022 - way after the 2021 sparseness started! So it must have gone the other way: their utilization was going down, and so they downgraded their plan.

Movie quantity over time

Next up, I want to look at a more high-level summary of the amount of movies we watched. My guess is that we watched way more while I was still living at home, and then that it spiked again after my mom retired. I might also guess that my parents watched more movies in 2020 and 2021 during Covid, but I’m not sure if that would be reflect in the number of DVDs since that’s also when they were using streaming services.

# Let's look at it monthly
df['return_month'] = pd.to_datetime(df['return_date'].dt.strftime('%Y-%m'))

movies_per_month = df.groupby('return_month').size().reset_index(name='n_movies')
movies_per_month.head()

	return_month	n_movies
0	2004-09-01	3
1	2004-10-01	8
2	2004-11-01	10
3	2004-12-01	10
4	2005-01-01	10

fig, ax = plt.subplots(figsize=(10, 5))
movies_per_month.plot(kind='scatter', x='return_month', y='n_movies', ax=ax)

ax.set_title('Number of movies per month')
ax.set_ylabel('Movies')
ax.set_xlabel('')

Text(0.5, 0, '')

png

Welp, nope - doesn’t look like there’s any discernible pattern in terms of the number of movies we watched over the years. It’s very interesting to me that you don’t see any obvious decreases when I moved out or even when my parents bumped their plan down to one DVD per month (but maybe that’s because there isn’t enough data to see that).

I wonder how this compares to the maximum possible number of movies per month. Let’s do some back-of-the-envelope math!

Assuming:

we have a plan that lets us have 3 DVDs at a time
we can only watch one movie per day
it takes Netflix one day to process a returned movie and ship out the next one
they send it with 2 day shipping to get to us
and when we mail it back, it goes with overnight return shipping

That means that each movie takes up a total of 5 days (1 day to be processed by Netflix + 2 days in the mail to get to us + 1 day to be watched + 1 day to return to Netflix). So each of the 3 DVDs can go through 6 full rental cycles per month, meaning that the max number of movies we could watch in a month is 18. On average, my family watched 7.3 movies per month – a little less than half of the possible rentals. But there were some months when we went through 14 movies, a 75% utilization rate! For a working family who definitely does not watch movies every day, not bad.

With that, thanks for joining me on this journey down Netflix memory lane! RIP Netflix DVD service, you were a true trailblazer ahead of your times, and those of us who were loyal fans for over 15 years thank you.

Share on

Twitter Facebook Google+ LinkedIn

Claire Duvallet

Thank you for 18 years of DVDs, Netflix

Getting the data

How many movies did we rent? (But first: a lot of data cleaning)

18 years of rentals

Daily rental patterns

Return day consistency seems informative

Bring in the parents: putting my hypotheses to the test

Movie quantity over time

Share on

You May Also Enjoy

Early startup employee lessons learned, part 4: adapting to your changing role

Early startup employee lessons learned, part 3: building culture

The Boston morning commute time warp

Early startup employee lessons learned, part 2: coping with the coaster