In this blog, we take a look at how web scraping IMDB data is done using Python. On top of various data points that are updated for both movies and small screen shows, IMDB also allows its users to add ratings and these ratings have formed the basis of multiple lists that are used by movie buffs and others to create their watch lists. Scraping IMDB top 250 movies in Python. April 19, 2016 5 Minute Read W eb crawling is much easier than it sounds like. I just started to use Python for about 3 weeks and now, with the help of a few modules, I’m able to start to scrape IMDB (static) pages. Scraping data from IMDb top 250 movies page with fields name, year and rating (Using python) Asish Raz. Used mainly for web-scraping Step-2.
imdb.py
frombs4importBeautifulSoup |
importrequests |
importre |
# Download IMDB's Top 250 data |
url='http://www.imdb.com/chart/top' |
response=requests.get(url) |
soup=BeautifulSoup(response.text, 'lxml') |
movies=soup.select('td.titleColumn') |
links= [a.attrs.get('href') forainsoup.select('td.titleColumn a')] |
crew= [a.attrs.get('title') forainsoup.select('td.titleColumn a')] |
ratings= [b.attrs.get('data-value') forbinsoup.select('td.posterColumn span[name=ir]')] |
votes= [b.attrs.get('data-value') forbinsoup.select('td.ratingColumn strong')] |
imdb= [] |
# Store each item into dictionary (data), then put those into a list (imdb) |
forindexinrange(0, len(movies)): |
# Seperate movie into: 'place', 'title', 'year' |
movie_string=movies[index].get_text() |
movie= (' '.join(movie_string.split()).replace('.', ')) |
movie_title=movie[len(str(index))+1:-7] |
year=re.search('((.*?))', movie_string).group(1) |
place=movie[:len(str(index))-(len(movie))] |
data= {'movie_title': movie_title, |
'year': year, |
'place': place, |
'star_cast': crew[index], |
'rating': ratings[index], |
'vote': votes[index], |
'link': links[index]} |
imdb.append(data) |
foriteminimdb: |
print(item['place'], '-', item['movie_title'], '('+item['year']+') -', 'Starring:', item['star_cast']) |
commented Jan 5, 2018
Web Scraping Icon
Web Scraping Images With Beautiful Soup
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment