In this post, I'm going to discuss how I used my open source Instagram scraper to scrape 25,000 data points from Joe Biden's Instagram page.
Combining selenium and instascrape, I wrote a quick script that automatically scrolled Joe Biden's Instagram page and scraped the first 500 posts, yielding us almost 25,000 data points to explore (with 49 data points per post) 🙌.
Let's see what his likes per post looks like with a little matplotlib and scikit-learn magic 😏
As expected, we can see steady growth and then a massive spike upwards as election day approached.
Let's take a look at comments per post now for the heck of it:
There's a ton of different things we can do now that the data is available to us and it's really up to you what you do with it. Using the to_dict
instance method, I can build a pandas.DataFrame from all of our data for easy analysis in a clean, expressive format. With a script like the following, we can get every post where Joe Biden used a hashtag.
dataframe[dataframe.hashtags.str.len() != 0]
or say we wanted every post where Joe got more than 1,000,000 likes:
dataframe[dataframe["likes"] > 1000000]
...so what are you waiting for? Get out there and start exploring Instagram data programatically!
If you're interested in reading more about instascrape, check out some of my other posts:
Exploratory data analysis of Instagram using instascrape and Python
Chris Greening ・ Oct 22 '20 ・ 6 min read
Downloading recent Instagram photos using instascrape and Python
Chris Greening ・ Oct 26 '20 ・ 2 min read
or better yet, come to the official repo and drop it a star and contribute ❤️
chris-greening / instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
instascrape: powerful Instagram data scraping toolkit
DISCLAIMER:
Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don't claim any responsibility if your Instagram account is affected by how you use this library.
What is it?
instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Key features
Here are a few of the things that…
Top comments (2)
This is amazing.thanks for sharing 🙂
Thanks so much Javed!! Glad you appreciated it <3