# run in the command line
#pip install youtube-data-api
Workshop Analyzing Social Media Data - Youtube Data
Introduction
This notebook walks through a Python code developed by Megan Brown, Senior Engineer at the Center for Social Media and Politics at NYU. The tutorial uses the youtube-data-api library.
If you want to learn some different strategies to analyze Youtube data, particularly, a very clever way to estimate the political leaning of Youtube videos, I strongly suggest you to check out the article by Lai et al, “Estimating Ideology of Youtube videos”
What kind of data can you get from the Youtube API?
Youtube has a very extensive api. There are a lot of data you can get access to. See a compreensive list here
What is included in the package:
- video metadata
- channel metadata
- playlist metadata
- subscription metadata
- featured channel metadata
- comment metadata
- search results
How to Install
The software is on PyPI, so you can download it via pip
How to get an API key
A quick guide: https://developers.google.com/youtube/v3/getting-started
You need a Google Account to access the Google API Console, request an API key, and register your application. You can use your GMail account for this if you have one.
Create a project in the Google Developers Console and obtain authorization credentials so your application can submit API requests.
After creating your project, make sure the YouTube Data API is one of the services that your application is registered to use.
Go to the API Console and select the project that you just registered.
Visit the Enabled APIs page. In the list of APIs, make sure the status is ON for the YouTube Data API v3. You do not need to enable OAuth 2.0 since there are no methods in the package that require it.
An overview of Youtube API
Calling the libraries
# call some libraries
import os
import datetime
import pandas as pd
# pass your keys
from youtube_api import YouTubeDataAPI
from youtube_api.youtube_api_utils import *
from dotenv import load_dotenv
# load keys from environmental var
# .env file in cwd
load_dotenv() = os.environ.get("YT_KEY")
api_key
# create a client
= YouTubeDataAPI(api_key) yt
Starting with a channel name and getting some basic metadata
Let’s start with the LastWeekTonight
channel
https://www.youtube.com/user/LastWeekTonight
First we need to get the channel id
= yt.get_channel_id_from_user('LastWeekTonight')
channel_id print(channel_id)
UC3XTzVzaHQEd30rQbuvCtTQ
Channel metadata
# collect metadata
yt.get_channel_metadata(channel_id)
{'channel_id': 'UC3XTzVzaHQEd30rQbuvCtTQ',
'title': 'LastWeekTonight',
'account_creation_date': 1395178899.0,
'keywords': None,
'description': 'Breaking news on a weekly basis. Sundays at 11PM - only on HBO.\nSubscribe to the Last Week Tonight channel for the latest videos from John Oliver and the LWT team.',
'view_count': '3472706969',
'video_count': '400',
'subscription_count': '9070000',
'playlist_id_likes': '',
'playlist_id_uploads': 'UU3XTzVzaHQEd30rQbuvCtTQ',
'topic_ids': 'https://en.wikipedia.org/wiki/Politics|https://en.wikipedia.org/wiki/Society|https://en.wikipedia.org/wiki/Entertainment|https://en.wikipedia.org/wiki/Television_program',
'country': None,
'collection_date': datetime.datetime(2022, 10, 18, 23, 17, 20, 78616)}
Subscriptions of the channel.
pd.DataFrame(yt.get_subscriptions(channel_id))
subscription_title | subscription_channel_id | subscription_kind | subscription_publish_date | collection_date | |
---|---|---|---|---|---|
0 | trueblood | UCPnlBOg4_NU9wdhRN-vzECQ | youtube#channel | 1.395357e+09 | 2022-10-18 23:17:20.206669 |
1 | GameofThrones | UCQzdMyuz0Lf4zo4uGcEujFw | youtube#channel | 1.395357e+09 | 2022-10-18 23:17:20.206716 |
2 | HBO | UCVTQuK2CaWaTgSsoNkn5AiQ | youtube#channel | 1.395357e+09 | 2022-10-18 23:17:20.206752 |
3 | HBOBoxing | UCWPQB43yGKEum3eW0P9N_nQ | youtube#channel | 1.395357e+09 | 2022-10-18 23:17:20.206792 |
4 | Cinemax | UCYbinjMxWwjRpp4WqgDqEDA | youtube#channel | 1.424812e+09 | 2022-10-18 23:17:20.206835 |
5 | HBODocs | UCbKo3HsaBOPhdRpgzqtRnqA | youtube#channel | 1.395357e+09 | 2022-10-18 23:17:20.206870 |
6 | HBOLatino | UCeKum6mhlVAjUFIW15mVBPg | youtube#channel | 1.395357e+09 | 2022-10-18 23:17:20.206904 |
7 | OfficialAmySedaris | UCicerXLHzJaKYHm1IwvTn8A | youtube#channel | 1.461561e+09 | 2022-10-18 23:17:20.206937 |
8 | Real Time with Bill Maher | UCy6kyFxaMqGtpE3pQTflK8A | youtube#channel | 1.418342e+09 | 2022-10-18 23:17:20.206971 |
List of videos of the channel
You first need to convert the channel_id
into a playlist id to get all the videos ever posted by a channel using a function from the youtube_api_utils
in the package. Then you can get the video ids, and collect metadata, comments, among many others.
from youtube_api.youtube_api_utils import *
= get_upload_playlist_id(channel_id)
playlist_id print(playlist_id)
## Get video ids
= yt.get_videos_from_playlist_id(playlist_id)
videos = pd.DataFrame(videos) df
UU3XTzVzaHQEd30rQbuvCtTQ
Collect video metadata
# id for videos as a list
df.video_id.tolist()
#grab metadata
= yt.get_video_metadata(df.video_id.tolist()[:5])
video_meta
#visualize
2]) pd.DataFrame(video_meta[:
video_id | channel_title | channel_id | video_publish_date | video_title | video_description | video_category | video_view_count | video_comment_count | video_like_count | video_dislike_count | video_thumbnail | video_tags | collection_date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Ns8NvPPHX5Y | LastWeekTonight | UC3XTzVzaHQEd30rQbuvCtTQ | 1.666003e+09 | Transgender Rights II: Last Week Tonight with ... | John Oliver discusses the latest round of atta... | 24 | 2227137 | 23236 | 95244 | None | https://i.ytimg.com/vi/Ns8NvPPHX5Y/hqdefault.jpg | 2022-10-18 23:17:21.274582 | |
1 | kCOnGjvYKI0 | LastWeekTonight | UC3XTzVzaHQEd30rQbuvCtTQ | 1.665398e+09 | Crime Reporting: Last Week Tonight with John O... | John Oliver discusses the outlets that cover c... | 24 | 3232207 | 5943 | 89845 | None | https://i.ytimg.com/vi/kCOnGjvYKI0/hqdefault.jpg | 2022-10-18 23:17:21.274612 |
Collect Comments
= df.video_id.tolist()[:5]
ids
# loop
= []
list_comments for video_id in ids:
= yt.get_video_comments(video_id, max_results=10)
comments
list_comments.append(pd.DataFrame(comments))
# concat
= pd.concat(list_comments)
df df.head()
video_id | commenter_channel_url | commenter_channel_id | commenter_channel_display_name | comment_id | comment_like_count | comment_publish_date | text | commenter_rating | comment_parent_id | collection_date | reply_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Ns8NvPPHX5Y | http://www.youtube.com/channel/UCQcuYcWoTYbtlx... | UCQcuYcWoTYbtlxRMI0bruXQ | Brigid Pfenninger | UgwbjPhusWygy8um6YF4AaABAg | 0 | 1.666164e+09 | I wish people would mind their own business. W... | none | None | 2022-10-18 23:17:21.500352 | 0 |
1 | Ns8NvPPHX5Y | http://www.youtube.com/channel/UClGdRsKW0vxmmo... | UClGdRsKW0vxmmoXPU8wpNgg | E Bellyfish | Ugwuodu_xhHB2_SacIJ4AaABAg | 0 | 1.666164e+09 | Trans rights are human rights. | none | None | 2022-10-18 23:17:21.500415 | 0 |
2 | Ns8NvPPHX5Y | http://www.youtube.com/channel/UCl2MFhAwpOwCgj... | UCl2MFhAwpOwCgjnPIN9zScQ | fantasia243645 | Ugy-aIoF-2KEWrxLHh54AaABAg | 3 | 1.666163e+09 | Why are people allowing children to consent to... | none | None | 2022-10-18 23:17:21.500464 | 2 |
3 | Ns8NvPPHX5Y | http://www.youtube.com/channel/UCI3JogrrM3q8sZ... | UCI3JogrrM3q8sZzcxE23H0w | Democracy Lives | UgxQ--0X5omlypnQBNl4AaABAg | 1 | 1.666163e+09 | The comparison to CRT panic is on point! Unfor... | none | None | 2022-10-18 23:17:21.500511 | 0 |
4 | Ns8NvPPHX5Y | http://www.youtube.com/channel/UCwL5M4gdC00-a5... | UCwL5M4gdC00-a5ZVFNZ0kQA | Lizard King | UgwjukKuviwU4pZRRHZ4AaABAg | 0 | 1.666163e+09 | TBH it sounds pretty hot to fall into some kin... | none | None | 2022-10-18 23:17:21.500557 | 0 |
Search
The youtube API also allows you to search for most popular videos using queries.
= pd.DataFrame(yt.search(q='urnas, fraude', max_results=10))
df
df.keys()"channel_title", "video_title"]] df[[
channel_title | video_title | |
---|---|---|
0 | UOL | Flávio Bolsonaro diz que não teve fraude nas u... |
1 | Rádio BandNews FM | Eleições: Resultados falsos da votação no exte... |
2 | Jovem Pan News | Urna eletrônica: houve fraude no 1º turno? – B... |
3 | CNN Brasil | Análise: Documento do PL de Bolsonaro aponta f... |
4 | Canal Nostalgia | URNA ELETRÔNICA / Dá pra Hackear? |
5 | Jovem Pan News - Bauru | Eleitor não conseguiu votar. Fraude ou manipul... |
6 | Rádio BandNews FM | "Militares não divulgarem resultado da fi... |
7 | Domingo Espetacular | Governo investiga diferença entre pesquisas el... |
8 | Jornalismo TV Cultura | Eleições 2022: Confira momento em que Lula ult... |
9 | UOL | Homem quebra urna eletrônica com socos na Para... |
Want more?
If you want to learn more about youtube, you shoudl definitely check these two paper from my CSMaP colleagues about Youtube.