PPOL 5203 Data Science I: Foundations

Scraping static websites
Tiago Ventura

Learning Goals¶

In the class today, we will focus on:

Understand different strategies to acquire digital data
Understanding html structure to look up content on a website
Scrape content from a static website
Build a scraper to systematically draw content from similarly organized webpages.

The Digital information age¶

We start our first lecture looking at this graph. It shows two things:

in the past few years we have produced and stored an enourmous among of data
Most of this data is produced and stored in digital environments.

Not all this data is available on digital spaces (like websites, social media apps, and digital archives). But some are. And as data scientists a primary skill that is expected from you is to be able to acquire, process, store and analyze this data. Today, we will focus on acquiring data in the digital information era.

There are three primary techniques through which you can acquire digital data:

Scrap data from self-contained (static) websites
Scrap data from dynamic (javascript powered) websites
Access data through Application Programming Interfaces

What is scraping?¶

Scraping consists of automatically collecting data available on websites. In theory, you can collect website data by hand, or asking a couple of friends to help you. However, in a world of abundant data, this is likely not feasible, and in general, it may become more difficult once you have learned to collect it automatically.

Let me give you some examples of websites I have alread scraped:

Electoral data from many different countries;
Composition of elites around the world;
Wikipedia;
Toutiao, a news aggregation from China;
Political Manifestos in Brazil
Fact-Checking News
Facebook and Youtube Live Chats.
Property Prices from Zillow.

Scraping can be summarize in:

leveraging the structure of a website to grab it's contents
using a programming environment (such as R, Python, Java, etc.) to systematically extract that content.
accomplishing the above in an "unobtrusive" and legal way.

Scraping vs APIs¶

An API is a set of rules and protocols that allows software applications to communicate with each other. APIs provide an front door for a developer to interact with a website.

APIs are used for many different types of online communication and information sharing, among those, many APIs have been developed to provide an easy and official way for developers and data scientists to access data.

As these APIs are developed by data owners, they are often secure, practical, and more organized than acquiring data through scrapping.

Scraping is a back door for when there’s no API or when we need content beyond the structured fields the API returns

if you can use the API to access a dataset, that's where you will want to go

Ethical Challenges with Scraping¶

Webscraping is legal as long as the scraped data is publicly available and the scraping activity does not harm the website being scraped. These are two hugely relevant conditionals. For this reason, before we start coding, it is carefully understand what each entails.

Each call to a web server takes time, server cycles, and memory. Most servers can handle significant traffic, but they can't necessarily handle the strain induced by massive automated requests. Your code can overload the site, taking it offline, or causing the site administrator to ban your IP. See Denial-of-service attack (DoS).

We do not want compromise the functioning of a website just because of our research. First, this overload can crash a server and prevent other users from accessing the site. Second, servers and hosters can, and do, implement countermeasures (i.e. block our access from our IP and so on).

In addition, take as a best practice of only collecting public information. Think about Facebook. In my personal view, it is okay to collect public posts, or data from public groups. If by some way you manage to get into private groups, and group members have an expectation of privacy, it is not okay to collect their data.

Here is a list of good practices for scraping:

Respect robots.txt
Don't hit servers too often
Slow down your code to the speed humans would manually do
Find trusted source sites
Do not shave during peak hours
Improve your code speed
Use data responsibly (As academics often do)

Scraping Routine¶

Scraping often involves the following routine:

Step 1: Find a website with information you want to collect
Step 2: Understand the website
Step 3: Write code to collect one realization of the data
Step 4: Build a scraper -- generalize you code into a function.

And repeat!

Step 1: Find a Website... but what is a website?¶

A website in general is a combination of HTML, CSS, XML, PHP, and Javascript. We will care mostly about HTMLs and CSSs.

Static vs Dynamic Websites¶

HTML forms what we call static websites - everything you see is there in the source behind the website. Javascript produces dynamic sites - ones that you browse and click on and the url doesn't change - and are sites typically powered by a database deep within the programming.

Today we will deal with static websites using the Python library Beautiful Soup. For dynamic websites, we will learn next class about working with selenium in Python.

HTML Website¶

HTML stands for HyperText Markup Language. As it is explict from the name, it is a markup language used to create web pages and is a cornerstone technology of the internet. It is not a programming language as Python, R and Java. Web browsers read HTML documents and render them into visible or audible web pages.

See an example of an html file:

<html>
<head>
  <title> Michael Cohen's Email </title>
  <script>
    var foot = bar;
  <script>
</head>
<body>
  <div id="payments">
  <h2>Second heading</h2>
  <p class='slick'>information about <br/><i>payments</i></p>
  <p>Just <a href="http://www.google.com">google it!</a></p>
</body>
</html>

HTML code is structured using tags, and information is organized hierarchcially (like a list or an array) from top to bottom.

Some of the most important tags we will use for scraping are:

p – paragraphs
a href – links
div – divisions
h – headings
table – tables

See here for more about html tags

<div class="alert alert-block alert-danger", style="font-size: 20px;"> Scraping is all about finding tags and collecting the data associated with them </div>

What else exists on HTML beyond tags?¶

The tags are the target. The information we need from html usually come from texts and attributes of the tag. Very often your work will consist on finding the tag, and then capturing the information you need. The figure below summarizes well this difference on html files.

Source:https://www.semrush.com/blog/html-anchor/

Step 2: Understand the website¶

As you anticipate, a huge part of the scraping work is to understand your website and find the tags/information you are interested in. There are two ways to go about it:

### Inspect the website: command + shift + i or select element, right click in the mouse, and inspect.
- See an example in practice (Source: Ultimate Guide to Web Scraping with Python by Brenda Marting)

### Use selector gadget: selector gadget is a tool that allow us to use CSS selector to scrap websites.
- See the documentation and a tutorial here.

Break : Install selector gadget¶

Here: https://chromewebstore.google.com/detail/selectorgadget/mhjhnkcfbdhnjickkkdbjoemdmbfginb

Step 3: Collect a realization of the data¶

To do webscraping, we will use two main libraries:

requests: to make a get() request and access the html behind the pages
BeautifulSoup: to parse the html

See this nice tutorial here to understand the difference between parsing html with BeautifulSoup and using text mining methods to scrap website.

# install libraries - Take the # out if this is the first time you are installing these packages. 
#!pip install requests
#!pip install beautifulsoup4

Scraping: CNN Politics¶

Let's scrap our first website. We will start scrapping some news from BBC

# setup
import pandas as pd
import requests # For downloading the website
from bs4 import BeautifulSoup # For parsing the website
import time # To put the system to sleep
import random # for random numbers

Get Request to collect html data¶

# Get access to the website
url = "https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc"
page = requests.get(url)

# check object type - object class requests
type(page)

requests.models.Response

# check if you got a connection
page.status_code # 200 == Connection

200

# See the content. 
# notice we downloaded the entire website.
# Do inspect to make sure of this in the web browser
page.content[0:1000]

b'  <!DOCTYPE html>\n<html lang="en" data-uri="cms.cnn.com/_pages/cmh7tyh9n004d27qk0cma6iti@published" data-layout-uri="cms.cnn.com/_layouts/layout-article-elevate/instances/politics-article-elevate_small-v1@published">\n  <head>\n<link rel="dns-prefetch" href="//tpc.googlesyndication.com">\n\n<link rel="preconnect" href="//tpc.googlesyndication.com">\n\n<link rel="dns-prefetch" href="//pagead2.googlesyndication.com">\n\n<link rel="preconnect" href="//pagead2.googlesyndication.com">\n\n<link rel="dns-prefetch" href="//www.googletagservices.com">\n\n<link rel="preconnect" href="//www.googletagservices.com">\n\n<link rel="dns-prefetch" href="//www.google.com">\n\n<link rel="preconnect" href="//www.google.com">\n\n<link rel="dns-prefetch" href="//c.amazon-adsystem.com">\n\n<link rel="preconnect" href="//c.amazon-adsystem.com">\n\n<link rel="dns-prefetch" href="//ib.adnxs.com">\n\n<link rel="preconnect" href="//ib.adnxs.com">\n\n<link rel="dns-prefetch" href="//cdn.adsafeprotected.com">\n\n<link rel="preconnect" href="/'

Saving and Loading a HTML locally¶

After we make a request and retrieve a web page's content, we can store that content locally with Python's open() function. Saving a html source could avoid you to hit the website multiple times.

# save html locally
with open("cnn_news1", 'wb') as f:
    f.write(page.content)

And here is how to open:

# open a locally saved html
with open("cnn_news1", 'rb') as f:
    html = f.read()
# see it
print(html[0:1000])

b'  <!DOCTYPE html>\n<html lang="en" data-uri="cms.cnn.com/_pages/cmh7tyh9n004d27qk0cma6iti@published" data-layout-uri="cms.cnn.com/_layouts/layout-article-elevate/instances/politics-article-elevate_small-v1@published">\n  <head>\n<link rel="dns-prefetch" href="//tpc.googlesyndication.com">\n\n<link rel="preconnect" href="//tpc.googlesyndication.com">\n\n<link rel="dns-prefetch" href="//pagead2.googlesyndication.com">\n\n<link rel="preconnect" href="//pagead2.googlesyndication.com">\n\n<link rel="dns-prefetch" href="//www.googletagservices.com">\n\n<link rel="preconnect" href="//www.googletagservices.com">\n\n<link rel="dns-prefetch" href="//www.google.com">\n\n<link rel="preconnect" href="//www.google.com">\n\n<link rel="dns-prefetch" href="//c.amazon-adsystem.com">\n\n<link rel="preconnect" href="//c.amazon-adsystem.com">\n\n<link rel="dns-prefetch" href="//ib.adnxs.com">\n\n<link rel="preconnect" href="//ib.adnxs.com">\n\n<link rel="dns-prefetch" href="//cdn.adsafeprotected.com">\n\n<link rel="preconnect" href="/'

Here it comes the beautifulsoup¶

Next, you will create a beautifulsoup object. A beautifulsoup object is just a parser. It allows us to easily access elements from the raw html.

# create an bs object.
# input 1: request content; input 2: tell you need an html parser
soup = BeautifulSoup(page.content, 'html.parser') 

# Let's look at the raw code of the downloaded website
print(soup.prettify()[:1000])

<!DOCTYPE html>
<html data-layout-uri="cms.cnn.com/_layouts/layout-article-elevate/instances/politics-article-elevate_small-v1@published" data-uri="cms.cnn.com/_pages/cmh7tyh9n004d27qk0cma6iti@published" lang="en">
 <head>
  <link href="//tpc.googlesyndication.com" rel="dns-prefetch"/>
  <link href="//tpc.googlesyndication.com" rel="preconnect"/>
  <link href="//pagead2.googlesyndication.com" rel="dns-prefetch"/>
  <link href="//pagead2.googlesyndication.com" rel="preconnect"/>
  <link href="//www.googletagservices.com" rel="dns-prefetch"/>
  <link href="//www.googletagservices.com" rel="preconnect"/>
  <link href="//www.google.com" rel="dns-prefetch"/>
  <link href="//www.google.com" rel="preconnect"/>
  <link href="//c.amazon-adsystem.com" rel="dns-prefetch"/>
  <link href="//c.amazon-adsystem.com" rel="preconnect"/>
  <link href="//ib.adnxs.com" rel="dns-prefetch"/>
  <link href="//ib.adnxs.com" rel="preconnect"/>
  <link href="//cdn.adsafeprotected.com" rel="dns-prefetch"/>
  <link

With the parser, we can look start looking at the data. The functions we will use the most are:

.find_all(): to find tags by their names
.select(): to select tags by using the CSS selector
.get_text(): to access the text in the tag
["attr"]: to access attributes of a tag

Let's start trying to grab all the textual information of the news. These are often under the tag <p> for paragraph

## find paragraph
cnn_par = soup.find_all('p')

cnn_par

[<p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh84863600033b6ne2m78hw0@published">
             As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8i7w0h001v3b6nhbmetwum@published">
             “While Donald Trump’s donor billionaires think they have the money to buy this election, we have a movement of the masses,” Mamdani said to a roaring crowd.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8i7w0h001w3b6nzsewqmym@published">
             Mamdani took the stage at a raucous rally alongside Sen. Bernie Sanders and Rep. Alexandria Ocasio-Cortez at Forest Hills Stadium in Queens, where thousands of people chanted Mamdani’s name and repeated in unison his signature proposals to freeze the rent, make buses fast and free, and provide universal child care.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8i7usv001t3b6nonqtrfrd@published">
             The rally was part closing argument, part rallying of the troops ahead of the November 4 election, with Mamdani casting the race as a choice between democracy and oligarchy and Sanders and Ocasio Cortez touting Mamdani’s campaign as the vanguard of a progressive movement itching to push back on the second Trump administration.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000s3b6n7e7nivgj@published">
             “I’m talking to you, Donald Trump,” Ocasio-Cortez declared, saying that “in nine short days we will work our hearts out to elect Zohran Kwame Mamdani as the next mayor of the great city of New York.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000t3b6n56s8pe3o@published">
             Sanders, speaking in the Brooklyn accent he’s never shed, told the crowd Trump and “the rest of the world” were watching the election.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000u3b6nxhd465b0@published">
             “A victory here in New York will give hope and inspiration to people throughout our country and throughout the world,” the Vermont senator said. “That is what this election is about, and that is why Donald Trump is paying attention to this election.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000v3b6n1ft65vam@published">
             After walking out to a roaring crowd and thumping Desi beat, a beaming Mamdani credited Ocasio-Cortez and Sanders, in particular, for inspiring <a href="https://www.cnn.com/2025/10/19/politics/democratic-socialists-zohran-mamdani-movement">the progressive movement</a> that’s become the backbone of his campaign.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000w3b6nccm0tfp9@published">
             “I stand before you tonight, only because the senator dared to stand alone for so long. I speak the language of democratic socialism, only because he spoke it first,” Mamdani said. “And when we win on November 4 and then govern from City Hall with dignity as the foundation of our politics, it will be because of the movement that Bernie built.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000x3b6nzgu3pvdh@published">
             The trio of leaders repeatedly nationalized the high-stakes New York mayoral contest, with Sanders remarking that “these are not normal times, this is not a normal election.” Each threaded calls to action against Republicans with critiques of the Democratic Party.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000y3b6nfey51fjc@published">
             “While Donald Trump’s billionaire donors think that they have the money to buy this election, we have a movement of the masses,” Mamdani said. “No longer will we allow the Republican Party to be the one of ambition. No longer will we have to open a history book to read about Democrats leading with big ideas.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0000z3b6nutjn86k0@published">
             Ocasio-Cortez argued that “the very forces that Zohran is up against in this race mirrors what we are up against nationally, both an authoritarian criminal presidency, fueled by corruption and bigotry and an ascendant right-wing extremist movement.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000103b6ncn8zwm5r@published">
             The New York representative castigated the “insufficient, eroded, bygone political establishment, this time in the form of Andrew Cuomo.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000113b6nfthg6u3j@published">
             The rally, though, reflected efforts to form a bridge between the upstart progressives and a wary Democratic establishment. Mamdani was also joined by New York Gov. Kathy Hochul — who <a href="https://www.cnn.com/2025/09/14/politics/kathy-hochul-zohran-mamdani">endorsed his campaign</a> just last month — as well as State Senate Majority Leader Andrea Stewart-Cousins and State Assembly Speaker Carl Heastie, a show of support in the closing stretch of the divisive race.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000123b6ni40rf26h@published">
             “Any person who wants to the mayor of the city in New York, to have three people you probably have to work with the most” showing support gives Mamdani “momentum into trying to achieve his agenda,” Heastie told CNN at the rally.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000133b6n4qoeqqeq@published">
             Addressing the crowd, Hochul echoed that sentiment. “The three of us can’t do it alone. We need a fighter in City Hall, who wakes up every day, ready to punch and fight for the working people of the city, and that person is Zohran Mamdani,” she said, repeatedly interrupted by chants of “tax the rich.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gkdin000m3b6n6aplnpys@published">
             Hochul has said she opposes Mamdani’s plan to tax the wealthy. With her own election next year, Hochul will be under pressure to deliver for Mamdani if he wins, but also to keep taxes from increasing. As the shouts continued through her speech, Mamdani came onstage and embraced her before the two walked off together.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000143b6nqflp5oss@published">
             Hochul also praised Mamdani for showing “grace, and courage and grit” during his campaign and criticized “Islamophobia and bigotry and hate-filled speech” directed his way.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000153b6neevywbp4@published">
             “That kind of bullsh*t doesn’t belong in New York,” the governor told the fired-up crowd.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000163b6n9asve5u6@published">
             The rowdy atmosphere was underscored by New York Comptroller Brad Lander — a former primary rival of Mamdani who quickly threw his support behind the upstart progressive — reprising a rousing condemnation of former Gov. Andrew Cuomo, who’s running as an independent, and the anti-Mamdani coalition.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000173b6nej29tw4g@published">
             “We had to send that corrupt, abusive bully Andrew Cuomo back to the suburbs. I said it on election night and I can’t wait to say it again next Tuesday — good f**king riddance!” Lander said.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000183b6ny9089a0n@published">
             The rally’s slogan, “New York Is Not for Sale,” was a dig at the vocal wealthy New Yorkers, including Bill Ackman and John Catsimatidis, who have been <a href="https://www.cnn.com/2025/10/20/politics/curtis-sliwa-nyc-mayoral-race">speaking out in opposition</a> to Mamdani’s candidacy, and in many cases donating to the super PAC aligned with Cuomo.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s000193b6nxpvfqbq0@published">
             “We have the same billionaires who funded Donald Trump’s campaign funding Andrew Cuomo’s, and whether it be Bill Ackman or the Waltons, people who think they can look at a city like ours, they can appraise it and they can buy it,” Mamdani told CNN in an exclusive interview Sunday.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0001a3b6n8g5dhqz8@published">
             He noted the rally’s slogan was a callback to Sanders’ own mayoral campaign in Burlington, Vermont, three decades ago.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gn0s0001b3b6nmirx3pgb@published">
             “He said when he was running to be the mayor of Burlington, that Burlington is not for sale,” Mamdani said. “It continues to be the rallying cry for working-class people across this country, and for us, it’s that New York City is not for sale.”
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8ilyjm00243b6nz1yctqgx@published">
             The rally came amid a surge in early voting turnout, with more than five times as many New Yorkers voting during the first weekend of early in-person voting for this year’s mayoral election compared with four years ago.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8imcif00283b6nqsmg80in@published">
             Data released Sunday by the <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fx.com%2FBOENYC%2Fstatus%2F1982562041564197135&amp;data=05%7C02%7Cisabelle.dantonio%40cnn.com%7C8a4f4b258f4040b31ad808de14dec0ec%7C0eb48825e8714459bc72d0ecd68f1f39%7C0%7C0%7C638971144352854215%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=5t5TR1pe5JUtOWvVnVJ3H4hwRdRQa38Zy9oL89kEgBo%3D&amp;reserved=0" target="_blank">New York City Board of Elections</a> shows 164,190 people voted this weekend. During the first weekend of early voting in 2021 — the first mayoral election where it was an option — only 31,176 people <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fx.com%2FBOENYC%2Fstatus%2F1452394906920554502&amp;data=05%7C02%7Cisabelle.dantonio%40cnn.com%7C8a4f4b258f4040b31ad808de14dec0ec%7C0eb48825e8714459bc72d0ecd68f1f39%7C0%7C0%7C638971144352876170%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&amp;sdata=jtCP3HnKqos7agbDYIgnkaovYuDkk5dInglO%2Fi3otNM%3D&amp;reserved=0" target="_blank">voted</a>.
     </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8ickk6001z3b6ni78priv3@published">
 <em>This story and headline have been updated with additional developments.</em>
 </p>,
 <p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gcpa4000l3b6nrhgkn2n1@published">
 <em>CNN’s Ethan Cohen contributed to this report. </em>
 </p>,
 <p class="footer__copyright-text" data-editable="copyrightText">© 2025 Cable News Network. A Warner Bros. Discovery Company. All Rights Reserved. <br/> CNN Sans ™ &amp; © 2016 Cable News Network.</p>]

## let's see how it looks like
len(cnn_par)

30

## let print one
cnn_par[3]

<p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8i7usv001t3b6nonqtrfrd@published">
            The rally was part closing argument, part rallying of the troops ahead of the November 4 election, with Mamdani casting the race as a choice between democracy and oligarchy and Sanders and Ocasio Cortez touting Mamdani’s campaign as the vanguard of a progressive movement itching to push back on the second Trump administration.
    </p>

You see you just parsed the full tag for all paragraphs of the text. Let's remove all html tags using the .get_text() method

# get the text. 
# This is what is in between the tags <p> TEXT </p>
cnn_par[0].get_text()

'\n            As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.\n    '

# use our friend list compreehension to parse all
all_par = [par.get_text() for par in cnn_par]
all_par

['\n            As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.\n    ',
 '\n            “While Donald Trump’s donor billionaires think they have the money to buy this election, we have a movement of the masses,” Mamdani said to a roaring crowd.\n    ',
 '\n            Mamdani took the stage at a raucous rally alongside Sen. Bernie Sanders and Rep. Alexandria Ocasio-Cortez at Forest Hills Stadium in Queens, where thousands of people chanted Mamdani’s name and repeated in unison his signature proposals to freeze the rent, make buses fast and free, and provide universal child care.\n    ',
 '\n            The rally was part closing argument, part rallying of the troops ahead of the November 4 election, with Mamdani casting the race as a choice between democracy and oligarchy and Sanders and Ocasio Cortez touting Mamdani’s campaign as the vanguard of a progressive movement itching to push back on the second Trump administration.\n    ',
 '\n            “I’m talking to you, Donald Trump,” Ocasio-Cortez declared, saying that “in nine short days we will work our hearts out to elect Zohran Kwame Mamdani as the next mayor of the great city of New York.”\n    ',
 '\n            Sanders, speaking in the Brooklyn accent he’s never shed, told the crowd Trump and “the rest of the world” were watching the election.\n    ',
 '\n            “A victory here in New York will give hope and inspiration to people throughout our country and throughout the world,” the Vermont senator said. “That is what this election is about, and that is why Donald Trump is paying attention to this election.”\n    ',
 '\n            After walking out to a roaring crowd and thumping Desi beat, a beaming Mamdani credited Ocasio-Cortez and Sanders, in particular, for inspiring the progressive movement that’s become the backbone of his campaign.\n    ',
 '\n            “I stand before you tonight, only because the senator dared to stand alone for so long. I speak the language of democratic socialism, only because he spoke it first,” Mamdani said. “And when we win on November 4 and then govern from City Hall with dignity as the foundation of our politics, it will be because of the movement that Bernie built.”\n    ',
 '\n            The trio of leaders repeatedly nationalized the high-stakes New York mayoral contest, with Sanders remarking that “these are not normal times, this is not a normal election.” Each threaded calls to action against Republicans with critiques of the Democratic Party.\n    ',
 '\n            “While Donald Trump’s billionaire donors think that they have the money to buy this election, we have a movement of the masses,” Mamdani said. “No longer will we allow the Republican Party to be the one of ambition. No longer will we have to open a history book to read about Democrats leading with big ideas.”\n    ',
 '\n            Ocasio-Cortez argued that “the very forces that Zohran is up against in this race mirrors what we are up against nationally, both an authoritarian criminal presidency, fueled by corruption and bigotry and an ascendant right-wing extremist movement.”\n    ',
 '\n            The New York representative castigated the “insufficient, eroded, bygone political establishment, this time in the form of Andrew Cuomo.”\n    ',
 '\n            The rally, though, reflected efforts to form a bridge between the upstart progressives and a wary Democratic establishment. Mamdani was also joined by New York Gov. Kathy Hochul — who endorsed his campaign just last month — as well as State Senate Majority Leader Andrea Stewart-Cousins and State Assembly Speaker Carl Heastie, a show of support in the closing stretch of the divisive race.\n    ',
 '\n            “Any person who wants to the mayor of the city in New York, to have three people you probably have to work with the most” showing support gives Mamdani “momentum into trying to achieve his agenda,” Heastie told CNN at the rally.\n    ',
 '\n            Addressing the crowd, Hochul echoed that sentiment. “The three of us can’t do it alone. We need a fighter in City Hall, who wakes up every day, ready to punch and fight for the working people of the city, and that person is Zohran Mamdani,” she said, repeatedly interrupted by chants of “tax the rich.”\n    ',
 '\n            Hochul has said she opposes Mamdani’s plan to tax the wealthy. With her own election next year, Hochul will be under pressure to deliver for Mamdani if he wins, but also to keep taxes from increasing. As the shouts continued through her speech, Mamdani came onstage and embraced her before the two walked off together.\n    ',
 '\n            Hochul also praised Mamdani for showing “grace, and courage and grit” during his campaign and criticized “Islamophobia and bigotry and hate-filled speech” directed his way.\n    ',
 '\n            “That kind of bullsh*t doesn’t belong in New York,” the governor told the fired-up crowd.\n    ',
 '\n            The rowdy atmosphere was underscored by New York Comptroller Brad Lander — a former primary rival of Mamdani who quickly threw his support behind the upstart progressive — reprising a rousing condemnation of former Gov. Andrew Cuomo, who’s running as an independent, and the anti-Mamdani coalition.\n    ',
 '\n            “We had to send that corrupt, abusive bully Andrew Cuomo back to the suburbs. I said it on election night and I can’t wait to say it again next Tuesday — good f**king riddance!” Lander said.\n    ',
 '\n            The rally’s slogan, “New York Is Not for Sale,” was a dig at the vocal wealthy New Yorkers, including Bill Ackman and John Catsimatidis, who have been speaking out in opposition to Mamdani’s candidacy, and in many cases donating to the super PAC aligned with Cuomo.\n    ',
 '\n            “We have the same billionaires who funded Donald Trump’s campaign funding Andrew Cuomo’s, and whether it be Bill Ackman or the Waltons, people who think they can look at a city like ours, they can appraise it and they can buy it,” Mamdani told CNN in an exclusive interview Sunday.\n    ',
 '\n            He noted the rally’s slogan was a callback to Sanders’ own mayoral campaign in Burlington, Vermont, three decades ago.\n    ',
 '\n            “He said when he was running to be the mayor of Burlington, that Burlington is not for sale,” Mamdani said. “It continues to be the rallying cry for working-class people across this country, and for us, it’s that New York City is not for sale.”\n    ',
 '\n            The rally came amid a surge in early voting turnout, with more than five times as many New Yorkers voting during the first weekend of early in-person voting for this year’s mayoral election compared with four years ago.\n    ',
 '\n            Data released Sunday by the New York City Board of Elections shows 164,190 people voted this weekend. During the first weekend of early voting in 2021 — the first mayoral election where it was an option — only 31,176 people voted.\n    ',
 '\nThis story and headline have been updated with additional developments.\n',
 '\nCNN’s Ethan Cohen contributed to this report. \n',
 '© 2025 Cable News Network. A Warner Bros. Discovery Company. All Rights Reserved.  CNN Sans ™ & © 2016 Cable News Network.']

You see neverthless that you did collect some junk that are not the paragraph information you are looking for.

This happens because there are multiple instances (under different tags) in which the tag <p> is used for. For example, if you look at the last element of the all_par list, you will see your scraper is collecting the footer of the webpage.

Solution¶

Be more specific. Work with a CSS selector.

A CSS selector is a pattern used to select and style one or more elements in an HTML document. It is a way to chain multiple style and attributes of an html file.

Another way to do this is using XPATH, which can be super useful to learn, but a bit more complicated for begginers.

See this tutorial here to understand the concept of a css selector

Let's use the selector gadget tool (see tutorial) to get a css selector for all the paragraphs.

# open your webbrowser and use the selector gadget
# website: https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc

# Use a css selector to target specific content
cnn_par = soup.select(".vossi-paragraph")

cnn_par[-1]

<p class="paragraph-elevate inline-placeholder vossi-paragraph" data-article-gutter="true" data-component-name="paragraph" data-editable="text" data-uri="cms.cnn.com/_components/paragraph/instances/cmh8gcpa4000l3b6nrhgkn2n1@published">
<em>CNN’s Ethan Cohen contributed to this report. </em>
</p>

story_content = [i.get_text() for i in cnn_par]
story_content

['\n            As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.\n    ',
 '\n            “While Donald Trump’s donor billionaires think they have the money to buy this election, we have a movement of the masses,” Mamdani said to a roaring crowd.\n    ',
 '\n            Mamdani took the stage at a raucous rally alongside Sen. Bernie Sanders and Rep. Alexandria Ocasio-Cortez at Forest Hills Stadium in Queens, where thousands of people chanted Mamdani’s name and repeated in unison his signature proposals to freeze the rent, make buses fast and free, and provide universal child care.\n    ',
 '\n            The rally was part closing argument, part rallying of the troops ahead of the November 4 election, with Mamdani casting the race as a choice between democracy and oligarchy and Sanders and Ocasio Cortez touting Mamdani’s campaign as the vanguard of a progressive movement itching to push back on the second Trump administration.\n    ',
 '\n            “I’m talking to you, Donald Trump,” Ocasio-Cortez declared, saying that “in nine short days we will work our hearts out to elect Zohran Kwame Mamdani as the next mayor of the great city of New York.”\n    ',
 '\n            Sanders, speaking in the Brooklyn accent he’s never shed, told the crowd Trump and “the rest of the world” were watching the election.\n    ',
 '\n            “A victory here in New York will give hope and inspiration to people throughout our country and throughout the world,” the Vermont senator said. “That is what this election is about, and that is why Donald Trump is paying attention to this election.”\n    ',
 '\n            After walking out to a roaring crowd and thumping Desi beat, a beaming Mamdani credited Ocasio-Cortez and Sanders, in particular, for inspiring the progressive movement that’s become the backbone of his campaign.\n    ',
 '\n            “I stand before you tonight, only because the senator dared to stand alone for so long. I speak the language of democratic socialism, only because he spoke it first,” Mamdani said. “And when we win on November 4 and then govern from City Hall with dignity as the foundation of our politics, it will be because of the movement that Bernie built.”\n    ',
 '\n            The trio of leaders repeatedly nationalized the high-stakes New York mayoral contest, with Sanders remarking that “these are not normal times, this is not a normal election.” Each threaded calls to action against Republicans with critiques of the Democratic Party.\n    ',
 '\n            “While Donald Trump’s billionaire donors think that they have the money to buy this election, we have a movement of the masses,” Mamdani said. “No longer will we allow the Republican Party to be the one of ambition. No longer will we have to open a history book to read about Democrats leading with big ideas.”\n    ',
 '\n            Ocasio-Cortez argued that “the very forces that Zohran is up against in this race mirrors what we are up against nationally, both an authoritarian criminal presidency, fueled by corruption and bigotry and an ascendant right-wing extremist movement.”\n    ',
 '\n            The New York representative castigated the “insufficient, eroded, bygone political establishment, this time in the form of Andrew Cuomo.”\n    ',
 '\n            The rally, though, reflected efforts to form a bridge between the upstart progressives and a wary Democratic establishment. Mamdani was also joined by New York Gov. Kathy Hochul — who endorsed his campaign just last month — as well as State Senate Majority Leader Andrea Stewart-Cousins and State Assembly Speaker Carl Heastie, a show of support in the closing stretch of the divisive race.\n    ',
 '\n            “Any person who wants to the mayor of the city in New York, to have three people you probably have to work with the most” showing support gives Mamdani “momentum into trying to achieve his agenda,” Heastie told CNN at the rally.\n    ',
 '\n            Addressing the crowd, Hochul echoed that sentiment. “The three of us can’t do it alone. We need a fighter in City Hall, who wakes up every day, ready to punch and fight for the working people of the city, and that person is Zohran Mamdani,” she said, repeatedly interrupted by chants of “tax the rich.”\n    ',
 '\n            Hochul has said she opposes Mamdani’s plan to tax the wealthy. With her own election next year, Hochul will be under pressure to deliver for Mamdani if he wins, but also to keep taxes from increasing. As the shouts continued through her speech, Mamdani came onstage and embraced her before the two walked off together.\n    ',
 '\n            Hochul also praised Mamdani for showing “grace, and courage and grit” during his campaign and criticized “Islamophobia and bigotry and hate-filled speech” directed his way.\n    ',
 '\n            “That kind of bullsh*t doesn’t belong in New York,” the governor told the fired-up crowd.\n    ',
 '\n            The rowdy atmosphere was underscored by New York Comptroller Brad Lander — a former primary rival of Mamdani who quickly threw his support behind the upstart progressive — reprising a rousing condemnation of former Gov. Andrew Cuomo, who’s running as an independent, and the anti-Mamdani coalition.\n    ',
 '\n            “We had to send that corrupt, abusive bully Andrew Cuomo back to the suburbs. I said it on election night and I can’t wait to say it again next Tuesday — good f**king riddance!” Lander said.\n    ',
 '\n            The rally’s slogan, “New York Is Not for Sale,” was a dig at the vocal wealthy New Yorkers, including Bill Ackman and John Catsimatidis, who have been speaking out in opposition to Mamdani’s candidacy, and in many cases donating to the super PAC aligned with Cuomo.\n    ',
 '\n            “We have the same billionaires who funded Donald Trump’s campaign funding Andrew Cuomo’s, and whether it be Bill Ackman or the Waltons, people who think they can look at a city like ours, they can appraise it and they can buy it,” Mamdani told CNN in an exclusive interview Sunday.\n    ',
 '\n            He noted the rally’s slogan was a callback to Sanders’ own mayoral campaign in Burlington, Vermont, three decades ago.\n    ',
 '\n            “He said when he was running to be the mayor of Burlington, that Burlington is not for sale,” Mamdani said. “It continues to be the rallying cry for working-class people across this country, and for us, it’s that New York City is not for sale.”\n    ',
 '\n            The rally came amid a surge in early voting turnout, with more than five times as many New Yorkers voting during the first weekend of early in-person voting for this year’s mayoral election compared with four years ago.\n    ',
 '\n            Data released Sunday by the New York City Board of Elections shows 164,190 people voted this weekend. During the first weekend of early voting in 2021 — the first mayoral election where it was an option — only 31,176 people voted.\n    ',
 '\nThis story and headline have been updated with additional developments.\n',
 '\nCNN’s Ethan Cohen contributed to this report. \n']

story_content[0].strip()

'As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.'

# Clean and join together with string methods
story_text = "\n".join([i.strip() for i in story_content])
print(story_text)

As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.
“While Donald Trump’s donor billionaires think they have the money to buy this election, we have a movement of the masses,” Mamdani said to a roaring crowd.
Mamdani took the stage at a raucous rally alongside Sen. Bernie Sanders and Rep. Alexandria Ocasio-Cortez at Forest Hills Stadium in Queens, where thousands of people chanted Mamdani’s name and repeated in unison his signature proposals to freeze the rent, make buses fast and free, and provide universal child care.
The rally was part closing argument, part rallying of the troops ahead of the November 4 election, with Mamdani casting the race as a choice between democracy and oligarchy and Sanders and Ocasio Cortez touting Mamdani’s campaign as the vanguard of a progressive movement itching to push back on the second Trump administration.
“I’m talking to you, Donald Trump,” Ocasio-Cortez declared, saying that “in nine short days we will work our hearts out to elect Zohran Kwame Mamdani as the next mayor of the great city of New York.”
Sanders, speaking in the Brooklyn accent he’s never shed, told the crowd Trump and “the rest of the world” were watching the election.
“A victory here in New York will give hope and inspiration to people throughout our country and throughout the world,” the Vermont senator said. “That is what this election is about, and that is why Donald Trump is paying attention to this election.”
After walking out to a roaring crowd and thumping Desi beat, a beaming Mamdani credited Ocasio-Cortez and Sanders, in particular, for inspiring the progressive movement that’s become the backbone of his campaign.
“I stand before you tonight, only because the senator dared to stand alone for so long. I speak the language of democratic socialism, only because he spoke it first,” Mamdani said. “And when we win on November 4 and then govern from City Hall with dignity as the foundation of our politics, it will be because of the movement that Bernie built.”
The trio of leaders repeatedly nationalized the high-stakes New York mayoral contest, with Sanders remarking that “these are not normal times, this is not a normal election.” Each threaded calls to action against Republicans with critiques of the Democratic Party.
“While Donald Trump’s billionaire donors think that they have the money to buy this election, we have a movement of the masses,” Mamdani said. “No longer will we allow the Republican Party to be the one of ambition. No longer will we have to open a history book to read about Democrats leading with big ideas.”
Ocasio-Cortez argued that “the very forces that Zohran is up against in this race mirrors what we are up against nationally, both an authoritarian criminal presidency, fueled by corruption and bigotry and an ascendant right-wing extremist movement.”
The New York representative castigated the “insufficient, eroded, bygone political establishment, this time in the form of Andrew Cuomo.”
The rally, though, reflected efforts to form a bridge between the upstart progressives and a wary Democratic establishment. Mamdani was also joined by New York Gov. Kathy Hochul — who endorsed his campaign just last month — as well as State Senate Majority Leader Andrea Stewart-Cousins and State Assembly Speaker Carl Heastie, a show of support in the closing stretch of the divisive race.
“Any person who wants to the mayor of the city in New York, to have three people you probably have to work with the most” showing support gives Mamdani “momentum into trying to achieve his agenda,” Heastie told CNN at the rally.
Addressing the crowd, Hochul echoed that sentiment. “The three of us can’t do it alone. We need a fighter in City Hall, who wakes up every day, ready to punch and fight for the working people of the city, and that person is Zohran Mamdani,” she said, repeatedly interrupted by chants of “tax the rich.”
Hochul has said she opposes Mamdani’s plan to tax the wealthy. With her own election next year, Hochul will be under pressure to deliver for Mamdani if he wins, but also to keep taxes from increasing. As the shouts continued through her speech, Mamdani came onstage and embraced her before the two walked off together.
Hochul also praised Mamdani for showing “grace, and courage and grit” during his campaign and criticized “Islamophobia and bigotry and hate-filled speech” directed his way.
“That kind of bullsh*t doesn’t belong in New York,” the governor told the fired-up crowd.
The rowdy atmosphere was underscored by New York Comptroller Brad Lander — a former primary rival of Mamdani who quickly threw his support behind the upstart progressive — reprising a rousing condemnation of former Gov. Andrew Cuomo, who’s running as an independent, and the anti-Mamdani coalition.
“We had to send that corrupt, abusive bully Andrew Cuomo back to the suburbs. I said it on election night and I can’t wait to say it again next Tuesday — good f**king riddance!” Lander said.
The rally’s slogan, “New York Is Not for Sale,” was a dig at the vocal wealthy New Yorkers, including Bill Ackman and John Catsimatidis, who have been speaking out in opposition to Mamdani’s candidacy, and in many cases donating to the super PAC aligned with Cuomo.
“We have the same billionaires who funded Donald Trump’s campaign funding Andrew Cuomo’s, and whether it be Bill Ackman or the Waltons, people who think they can look at a city like ours, they can appraise it and they can buy it,” Mamdani told CNN in an exclusive interview Sunday.
He noted the rally’s slogan was a callback to Sanders’ own mayoral campaign in Burlington, Vermont, three decades ago.
“He said when he was running to be the mayor of Burlington, that Burlington is not for sale,” Mamdani said. “It continues to be the rallying cry for working-class people across this country, and for us, it’s that New York City is not for sale.”
The rally came amid a surge in early voting turnout, with more than five times as many New Yorkers voting during the first weekend of early in-person voting for this year’s mayoral election compared with four years ago.
Data released Sunday by the New York City Board of Elections shows 164,190 people voted this weekend. During the first weekend of early voting in 2021 — the first mayoral election where it was an option — only 31,176 people voted.
This story and headline have been updated with additional developments.
CNN’s Ethan Cohen contributed to this report.

What else can we collect from this news?¶

title
author
date

Let's do it.

# title
css_loc = "#maincontent"
story_title = soup.select(css_loc)
story_title[0]

<h1 class="headline__text inline-placeholder vossi-headline-text" data-editable="headlineText" id="maincontent">
      Mamdani rallies with Sanders and Ocasio-Cortez as Democrats close ranks around NYC mayoral nominee
    </h1>

story_title = story_title[0].get_text()
print(story_title)

      Mamdani rallies with Sanders and Ocasio-Cortez as Democrats close ranks around NYC mayoral nominee

# story date
story_date = soup.select(".vossi-timestamp")[0].get_text()
print(story_date)



Updated Oct 26, 2025, 10:33 PM ET
Published Oct 26, 2025, 12:03 PM ET


Updated Oct 26, 2025, 10:33 PM ET
PUBLISHED Oct 26, 2025, 12:03 PM ET

# story authors
story_author = soup.select(".byline__name")[0].get_text()
print(story_author)

Gloria Pazmino

# let's nest all in a list
entry = [url, story_title.strip(),story_date.strip(),story_text]
entry

['https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc',
 'Mamdani rallies with Sanders and Ocasio-Cortez as Democrats close ranks around NYC mayoral nominee',
 'Updated Oct 26, 2025, 10:33 PM ET\nPublished Oct 26, 2025, 12:03 PM ET\n\n\n\n\n\nUpdated Oct 26, 2025, 10:33 PM ET\nPUBLISHED Oct 26, 2025, 12:03 PM ET',
 'As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.\n“While Donald Trump’s donor billionaires think they have the money to buy this election, we have a movement of the masses,” Mamdani said to a roaring crowd.\nMamdani took the stage at a raucous rally alongside Sen. Bernie Sanders and Rep. Alexandria Ocasio-Cortez at Forest Hills Stadium in Queens, where thousands of people chanted Mamdani’s name and repeated in unison his signature proposals to freeze the rent, make buses fast and free, and provide universal child care.\nThe rally was part closing argument, part rallying of the troops ahead of the November 4 election, with Mamdani casting the race as a choice between democracy and oligarchy and Sanders and Ocasio Cortez touting Mamdani’s campaign as the vanguard of a progressive movement itching to push back on the second Trump administration.\n“I’m talking to you, Donald Trump,” Ocasio-Cortez declared, saying that “in nine short days we will work our hearts out to elect Zohran Kwame Mamdani as the next mayor of the great city of New York.”\nSanders, speaking in the Brooklyn accent he’s never shed, told the crowd Trump and “the rest of the world” were watching the election.\n“A victory here in New York will give hope and inspiration to people throughout our country and throughout the world,” the Vermont senator said. “That is what this election is about, and that is why Donald Trump is paying attention to this election.”\nAfter walking out to a roaring crowd and thumping Desi beat, a beaming Mamdani credited Ocasio-Cortez and Sanders, in particular, for inspiring the progressive movement that’s become the backbone of his campaign.\n“I stand before you tonight, only because the senator dared to stand alone for so long. I speak the language of democratic socialism, only because he spoke it first,” Mamdani said. “And when we win on November 4 and then govern from City Hall with dignity as the foundation of our politics, it will be because of the movement that Bernie built.”\nThe trio of leaders repeatedly nationalized the high-stakes New York mayoral contest, with Sanders remarking that “these are not normal times, this is not a normal election.” Each threaded calls to action against Republicans with critiques of the Democratic Party.\n“While Donald Trump’s billionaire donors think that they have the money to buy this election, we have a movement of the masses,” Mamdani said. “No longer will we allow the Republican Party to be the one of ambition. No longer will we have to open a history book to read about Democrats leading with big ideas.”\nOcasio-Cortez argued that “the very forces that Zohran is up against in this race mirrors what we are up against nationally, both an authoritarian criminal presidency, fueled by corruption and bigotry and an ascendant right-wing extremist movement.”\nThe New York representative castigated the “insufficient, eroded, bygone political establishment, this time in the form of Andrew Cuomo.”\nThe rally, though, reflected efforts to form a bridge between the upstart progressives and a wary Democratic establishment. Mamdani was also joined by New York Gov. Kathy Hochul — who endorsed his campaign just last month — as well as State Senate Majority Leader Andrea Stewart-Cousins and State Assembly Speaker Carl Heastie, a show of support in the closing stretch of the divisive race.\n“Any person who wants to the mayor of the city in New York, to have three people you probably have to work with the most” showing support gives Mamdani “momentum into trying to achieve his agenda,” Heastie told CNN at the rally.\nAddressing the crowd, Hochul echoed that sentiment. “The three of us can’t do it alone. We need a fighter in City Hall, who wakes up every day, ready to punch and fight for the working people of the city, and that person is Zohran Mamdani,” she said, repeatedly interrupted by chants of “tax the rich.”\nHochul has said she opposes Mamdani’s plan to tax the wealthy. With her own election next year, Hochul will be under pressure to deliver for Mamdani if he wins, but also to keep taxes from increasing. As the shouts continued through her speech, Mamdani came onstage and embraced her before the two walked off together.\nHochul also praised Mamdani for showing “grace, and courage and grit” during his campaign and criticized “Islamophobia and bigotry and hate-filled speech” directed his way.\n“That kind of bullsh*t doesn’t belong in New York,” the governor told the fired-up crowd.\nThe rowdy atmosphere was underscored by New York Comptroller Brad Lander — a former primary rival of Mamdani who quickly threw his support behind the upstart progressive — reprising a rousing condemnation of former Gov. Andrew Cuomo, who’s running as an independent, and the anti-Mamdani coalition.\n“We had to send that corrupt, abusive bully Andrew Cuomo back to the suburbs. I said it on election night and I can’t wait to say it again next Tuesday — good f**king riddance!” Lander said.\nThe rally’s slogan, “New York Is Not for Sale,” was a dig at the vocal wealthy New Yorkers, including Bill Ackman and John Catsimatidis, who have been speaking out in opposition to Mamdani’s candidacy, and in many cases donating to the super PAC aligned with Cuomo.\n“We have the same billionaires who funded Donald Trump’s campaign funding Andrew Cuomo’s, and whether it be Bill Ackman or the Waltons, people who think they can look at a city like ours, they can appraise it and they can buy it,” Mamdani told CNN in an exclusive interview Sunday.\nHe noted the rally’s slogan was a callback to Sanders’ own mayoral campaign in Burlington, Vermont, three decades ago.\n“He said when he was running to be the mayor of Burlington, that Burlington is not for sale,” Mamdani said. “It continues to be the rallying cry for working-class people across this country, and for us, it’s that New York City is not for sale.”\nThe rally came amid a surge in early voting turnout, with more than five times as many New Yorkers voting during the first weekend of early in-person voting for this year’s mayoral election compared with four years ago.\nData released Sunday by the New York City Board of Elections shows 164,190 people voted this weekend. During the first weekend of early voting in 2021 — the first mayoral election where it was an option — only 31,176 people voted.\nThis story and headline have been updated with additional developments.\nCNN’s Ethan Cohen contributed to this report.']

Practice with Latin News Website¶

Your task: Look a this website here: https://www.latinnews.com/latinnews-country-database.html?country=2156

Do the following taks:

Select a single link for you to explore.
Write code to scrape the link.
Collect the following information:
- text of the news
- title of the news
- url associated with the news
- date of the post
Return all of them as a pandas dataframe.

# write your solution here

Step 4: Build a scraper¶

After you have your scrapper working for a single case, you will generalize your work. As we learned before, we do this by creating a function (or a full class with different methods).

Notice, here it is also important to add the good practices which try to imitate human behavior inside of our functions.

Let's start with a function to scrap news from CNN

# Building a scraper
# The idea here is to just wrap the above in a function.
# Input: url
# Output: relevant content

def cnn_scraper(url=None):
    '''
    this function scraps relevant content from cnn website
    input: str, url from cnn
    '''

    # Get access to the website
    page = requests.get(url)
    # create an bs object.
    soup = BeautifulSoup(page.content, 'html.parser') # input 1: request content; input 2: tell you need an html parser

    # check if you got a connection
    if page.status_code == 200:
        
        # parse text
        bbc_par = soup.select(".vossi-paragraph")
        story_content = [i.get_text() for i in bbc_par]
        story_text = "\n".join([i.strip() for i in story_content])

        # parse title
        css_loc = "#maincontent"
        story_title = soup.select(css_loc)[0].get_text()
        
        # story date
        story_date = soup.select(".vossi-timestamp")[0].get_text()
        
        # story authors
        story_author = soup.select(".byline__name")[0].get_text()

        # let's nest all in a list
        entry = {"url":url, "story_title":story_title.strip(),
                  "story_date":story_date.strip(),"text":story_text}
        
        # return 
        return entry

Let's see if our function works:

# Test on the same case
url = "https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc"
scrap_news = cnn_scraper(url=url)
scrap_news

{'url': 'https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc',
 'story_title': 'Mamdani rallies with Sanders and Ocasio-Cortez as Democrats close ranks around NYC mayoral nominee',
 'story_date': 'Updated Oct 26, 2025, 10:33 PM ET\nPublished Oct 26, 2025, 12:03 PM ET\n\n\n\n\n\nUpdated Oct 26, 2025, 10:33 PM ET\nPUBLISHED Oct 26, 2025, 12:03 PM ET',
 'text': 'As New Yorkers cast their ballots in the city’s mayoral race, Democratic mayoral nominee Zohran Mamdani sent a message to voters Sunday, asking them not to choose “settling for the lesser of two evils,” warning his supporters to not take his double-digit lead in the polls as a guarantee.\n“While Donald Trump’s donor billionaires think they have the money to buy this election, we have a movement of the masses,” Mamdani said to a roaring crowd.\nMamdani took the stage at a raucous rally alongside Sen. Bernie Sanders and Rep. Alexandria Ocasio-Cortez at Forest Hills Stadium in Queens, where thousands of people chanted Mamdani’s name and repeated in unison his signature proposals to freeze the rent, make buses fast and free, and provide universal child care.\nThe rally was part closing argument, part rallying of the troops ahead of the November 4 election, with Mamdani casting the race as a choice between democracy and oligarchy and Sanders and Ocasio Cortez touting Mamdani’s campaign as the vanguard of a progressive movement itching to push back on the second Trump administration.\n“I’m talking to you, Donald Trump,” Ocasio-Cortez declared, saying that “in nine short days we will work our hearts out to elect Zohran Kwame Mamdani as the next mayor of the great city of New York.”\nSanders, speaking in the Brooklyn accent he’s never shed, told the crowd Trump and “the rest of the world” were watching the election.\n“A victory here in New York will give hope and inspiration to people throughout our country and throughout the world,” the Vermont senator said. “That is what this election is about, and that is why Donald Trump is paying attention to this election.”\nAfter walking out to a roaring crowd and thumping Desi beat, a beaming Mamdani credited Ocasio-Cortez and Sanders, in particular, for inspiring the progressive movement that’s become the backbone of his campaign.\n“I stand before you tonight, only because the senator dared to stand alone for so long. I speak the language of democratic socialism, only because he spoke it first,” Mamdani said. “And when we win on November 4 and then govern from City Hall with dignity as the foundation of our politics, it will be because of the movement that Bernie built.”\nThe trio of leaders repeatedly nationalized the high-stakes New York mayoral contest, with Sanders remarking that “these are not normal times, this is not a normal election.” Each threaded calls to action against Republicans with critiques of the Democratic Party.\n“While Donald Trump’s billionaire donors think that they have the money to buy this election, we have a movement of the masses,” Mamdani said. “No longer will we allow the Republican Party to be the one of ambition. No longer will we have to open a history book to read about Democrats leading with big ideas.”\nOcasio-Cortez argued that “the very forces that Zohran is up against in this race mirrors what we are up against nationally, both an authoritarian criminal presidency, fueled by corruption and bigotry and an ascendant right-wing extremist movement.”\nThe New York representative castigated the “insufficient, eroded, bygone political establishment, this time in the form of Andrew Cuomo.”\nThe rally, though, reflected efforts to form a bridge between the upstart progressives and a wary Democratic establishment. Mamdani was also joined by New York Gov. Kathy Hochul — who endorsed his campaign just last month — as well as State Senate Majority Leader Andrea Stewart-Cousins and State Assembly Speaker Carl Heastie, a show of support in the closing stretch of the divisive race.\n“Any person who wants to the mayor of the city in New York, to have three people you probably have to work with the most” showing support gives Mamdani “momentum into trying to achieve his agenda,” Heastie told CNN at the rally.\nAddressing the crowd, Hochul echoed that sentiment. “The three of us can’t do it alone. We need a fighter in City Hall, who wakes up every day, ready to punch and fight for the working people of the city, and that person is Zohran Mamdani,” she said, repeatedly interrupted by chants of “tax the rich.”\nHochul has said she opposes Mamdani’s plan to tax the wealthy. With her own election next year, Hochul will be under pressure to deliver for Mamdani if he wins, but also to keep taxes from increasing. As the shouts continued through her speech, Mamdani came onstage and embraced her before the two walked off together.\nHochul also praised Mamdani for showing “grace, and courage and grit” during his campaign and criticized “Islamophobia and bigotry and hate-filled speech” directed his way.\n“That kind of bullsh*t doesn’t belong in New York,” the governor told the fired-up crowd.\nThe rowdy atmosphere was underscored by New York Comptroller Brad Lander — a former primary rival of Mamdani who quickly threw his support behind the upstart progressive — reprising a rousing condemnation of former Gov. Andrew Cuomo, who’s running as an independent, and the anti-Mamdani coalition.\n“We had to send that corrupt, abusive bully Andrew Cuomo back to the suburbs. I said it on election night and I can’t wait to say it again next Tuesday — good f**king riddance!” Lander said.\nThe rally’s slogan, “New York Is Not for Sale,” was a dig at the vocal wealthy New Yorkers, including Bill Ackman and John Catsimatidis, who have been speaking out in opposition to Mamdani’s candidacy, and in many cases donating to the super PAC aligned with Cuomo.\n“We have the same billionaires who funded Donald Trump’s campaign funding Andrew Cuomo’s, and whether it be Bill Ackman or the Waltons, people who think they can look at a city like ours, they can appraise it and they can buy it,” Mamdani told CNN in an exclusive interview Sunday.\nHe noted the rally’s slogan was a callback to Sanders’ own mayoral campaign in Burlington, Vermont, three decades ago.\n“He said when he was running to be the mayor of Burlington, that Burlington is not for sale,” Mamdani said. “It continues to be the rallying cry for working-class people across this country, and for us, it’s that New York City is not for sale.”\nThe rally came amid a surge in early voting turnout, with more than five times as many New Yorkers voting during the first weekend of early in-person voting for this year’s mayoral election compared with four years ago.\nData released Sunday by the New York City Board of Elections shows 164,190 people voted this weekend. During the first weekend of early voting in 2021 — the first mayoral election where it was an option — only 31,176 people voted.\nThis story and headline have been updated with additional developments.\nCNN’s Ethan Cohen contributed to this report.'}

beautiful!¶

Let's now assume you actually have a list of urls. So we will iterated our scrapper through this list.

# create a list of urls
urls = ["https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc", 
        "https://www.cnn.com/2025/10/26/politics/jessica-tisch-zohran-mamdani-election",
       "https://www.cnn.com/2025/10/24/politics/fact-check-ballroom-press-secretary", 
       "https://www.cnn.com/2025/09/26/politics/cnn-independents-poll-methodology"]

# Then just loop through and collect
scraped_data = []

for url in urls:

    # Scrape the content
    scraped_data.append(cnn_scraper(url))

    # Put the system to sleep for a random draw of time (be kind)
    time.sleep(random.uniform(.5,1))
    
    print(url)

https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc
https://www.cnn.com/2025/10/26/politics/jessica-tisch-zohran-mamdani-election
https://www.cnn.com/2025/10/24/politics/fact-check-ballroom-press-secretary
https://www.cnn.com/2025/09/26/politics/cnn-independents-poll-methodology

# Look at the data object
scraped_data[3]

{'url': 'https://www.cnn.com/2025/09/26/politics/cnn-independents-poll-methodology',
 'story_title': 'How CNN conducted its deep dive into independent US adults',
 'story_date': 'Updated Sep 26, 2025, 5:00 AM ET\nPublished Sep 26, 2025, 5:00 AM ET\n\n\n\n\n\nPUBLISHED Sep 26, 2025, 5:00 AM ET',
 'text': 'The CNN poll conducted by SSRS was designed to offer a deep-dive look at the different types of independents that make up this critical group in American politics.\nThe survey was conducted both online and by telephone from August 21 through September 1. The 2,077 adults who took the poll were selected from two sample sources – the SSRS Opinion Panel and a list of registered voters, including some who had previously taken a CNN survey conducted in April. The survey included 1,006 political independents, defined as people who said they were independents or did not identify with either major political party. The sample was designed to include a larger than usual number of political independents in order to get a more reliable picture of their views. It was then weighted to partisan benchmarks obtained from CNN’s May 2025 national survey so that each party represents its proper share within the full adult results. The full sample was also weighted to demographic and other benchmarks to ensure that it is representative of the adult population.\nIndependents’ answers to the survey questions were combined to create five different scales assessing the depth of respondents’ connections to the Democratic and Republican parties, their level of political engagement, their confidence in the political system, the extent to which they feel represented by the political system and their degree of openness to both Democrats and Republicans. Those scales were used as part of a statistical technique called K-means cluster analysis – which groups together respondents with similar answers across all five scales into separate groups – to create the five resulting subgroups of independents.\nThe project took inspiration from a 2007 Washington Post/KFF/Harvard School of Public Health project which also segmented political independents using similar techniques. Several questions from the 2007 survey were repeated in the new poll. Three of the resulting groups in the new project are similar to those found in 2007. The other two are quite different, likely reflecting the significant changes in the American political climate over that time.\nResults for the full sample have a margin of sampling error of plus or minus 2.7 percentage points. Among the sample of 1,006 independents, the error margin is +/-4.2 points. The margin of sampling error for results among the five subgroups of independents ranges from +/-7.9 percentage points to +/-10.7 points.'}

# Organize as a pandas data frame
dat = pd.DataFrame(scraped_data)
dat.head()

This completes all the steps of scraping:

Step 1: Find a website with information you want to collect
Step 2: Understand the website
Step 3: Write code to collect one realization of the data
Step 4: Build a scraper -- generalize you code into a function.
Step 5: Save

Collecting multiple urls¶

It is unlikely you will ever have a complete list of urls you want to scrap. Most likely collecting the full list of sources will be a step on your scraping task. Remember, urls usually come embedded as tags attributes. So let's write a function to collect multiple urls from the CNN website. Let's do so following all our pre-determined steps

# Step 1: Find a website with information you want to collect
## let's get links on cnn politics
url = "https://www.cnn.com/politics"

url

'https://www.cnn.com/politics'

# Step 2: Understand the website
# links are embedded across multiple titles. 
# these titles have the follwing tag <.container__headline span>

# Step 3: Write code to collect one realization of the data
# Get access to the website
page = requests.get(url)
# create an bs object.
soup = BeautifulSoup(page.content, 'html.parser') # input 1: request content; input 2: tell you need an html parser

# Step 3: Write code to collect one realization of the data
# with a css selector
links = soup.select(".container_lead-plus-headlines__item--type-section")
#links = soup.select(".container_lead-plus-headlines__headline")
links

[<li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/21/politics/va-therapists-treatment-sessions-limited" data-page="cms.cnn.com/_pages/cmgzg04d1004326p2g8ur5dmw@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_1@published" data-word-count="1741">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/21/politics/va-therapists-treatment-sessions-limited">
 <div class="container__item-media-wrapper container_lead-plus-headlines__item-media-wrapper" data-breakpoints='{"card--media-large": 525, "card--media-extra-large": 660, "card--media-card-label-show": 200}'>
 <div class="container__item-media container_lead-plus-headlines__item-media">
 <div class="image image__hide-placeholder" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--eq-large": 660}' data-component-name="image" data-image-variation="image" data-name="GettyImages-2197525781.jpg" data-observe-resizes="" data-original-height="1600" data-original-ratio="0.6666666666666666" data-original-width="2400" data-unselectable="true" data-uri="cms.cnn.com/_components/image/instances/card_clbdmol44002m3d6euos0g5zg_fill_1@published" data-url="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=original">
 <script>function imageLoadError(img) {
     const fallbackImage = '/media/sites/cnn/cnn-fallback-image.jpg';
 
     img.removeAttribute('onerror');
     img.src = fallbackImage;
     let element = img.previousElementSibling;
 
     while (element && element.tagName === 'SOURCE') {
       element.srcset = fallbackImage;
       element = element.previousElementSibling;
     }
 
     img.dataset?.imgCssVars.split(',').forEach((property) => {
       img.style.removeProperty(property);
     });
   }</script>
 <div class="image__container" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--show-credits": 525}' data-image-variation="image">
 <picture class="image__picture"><source height="438" media="(min-width: 1280px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="438" media="(min-width: 960px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="270" media="(min-width: 480px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><source height="270" media="(max-width: 479px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><img alt="The Department of Veterans Affairs headquarters in Washington, DC, on February 7, 2025." class="image__dam-img image__dam-img--loading" height="1600" onerror="imageLoadError(this)" onload="this.classList.remove('image__dam-img--loading')" src="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="2400"/></picture>
 </div>
 <div class="image__metadata"><figcaption class="image__credit">Stefani Reynolds/Bloomberg/Getty Images/File</figcaption></div>
 </div>
 </div>
 </div>
 </a>
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/21/politics/va-therapists-treatment-sessions-limited">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <span class="container__headline-text" data-editable="headline">Some VA therapists and patients say their treatment sessions are being limited, raising alarm</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/27/politics/joe-biden-kennedy-award" data-page="cms.cnn.com/_pages/cmh8bj8gq004u25qifc009fwb@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_2@published" data-word-count="465">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/joe-biden-kennedy-award">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Biden warns of dark days for the country as he urges Americans to stay optimistic</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/27/politics/donald-trump-mri-health-walter-reed" data-page="cms.cnn.com/_pages/cmh8xvkxg005z26nr6gj073nt@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_3@published" data-word-count="380">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/donald-trump-mri-health-walter-reed">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Trump says he received an MRI during trip to Walter Reed medical center</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/27/politics/mike-braun-indiana-redistricting" data-page="cms.cnn.com/_pages/cmh97g1us004x27nx8xj4hvja@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_4@published" data-word-count="676">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/mike-braun-indiana-redistricting">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Indiana Gov. Mike Braun calls a special session to redraw the state’s congressional boundaries</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/27/politics/nuclear-weapons-shutdown-delay-trump-nnsa" data-page="cms.cnn.com/_pages/cmh54h7bb000027p86h688e31@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_5@published" data-word-count="1231">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/nuclear-weapons-shutdown-delay-trump-nnsa">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__text-label container__text-label--type-for-subscribers container_lead-plus-headlines__text-label container_lead-plus-headlines__text-label--type-for-subscribers"><span class="container__text-label--icon"></span><span class="container__text-label--text-content">For Subscribers</span></span>
 <span class="container__headline-text" data-editable="headline">How the shutdown threatens to halt Trump’s aggressive nuclear security goals</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/27/politics/government-shutdown-snap-food-stamps" data-page="cms.cnn.com/_pages/cmh562kzz000z26p375xo5v8p@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_6@published" data-word-count="1048">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/government-shutdown-snap-food-stamps">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Democrats unflinching in shutdown strategy, blaming Trump with millions at risk of losing food aid</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/27/politics/sean-duffy-nasa-transportation" data-page="cms.cnn.com/_pages/cmh92d0qp005z2ap682o03sa1@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_7@published" data-word-count="586">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/sean-duffy-nasa-transportation">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Sean Duffy irks White House as drama over top job at NASA seeps into public view</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/26/politics/navy-aircraft-crash-south-china-sea" data-page="cms.cnn.com/_pages/cmh8c1382002x28nx5wvs9npq@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_8@published" data-word-count="455">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/26/politics/navy-aircraft-crash-south-china-sea">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">US Navy helicopter and fighter jet crash in separate incidents in the South China Sea</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/27/politics/trump-asia-latin-america" data-page="cms.cnn.com/_pages/cmh88wvlw004y26p3czoz59ur@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_1@published" data-word-count="1832">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/trump-asia-latin-america">
 <div class="container__item-media-wrapper container_lead-plus-headlines__item-media-wrapper" data-breakpoints='{"card--media-large": 525, "card--media-extra-large": 660, "card--media-card-label-show": 200}'>
 <div class="container__item-media container_lead-plus-headlines__item-media">
 <div class="image image__hide-placeholder" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--eq-large": 660}' data-component-name="image" data-image-variation="image" data-name="GettyImages-2242705715.jpg" data-observe-resizes="" data-original-height="3248" data-original-ratio="0.7143171321750604" data-original-width="4547" data-unselectable="true" data-uri="cms.cnn.com/_components/image/instances/card_clbdmoqpi002v3d6ewn6g7x8f_fill_1@published" data-url="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2242705715-20251025043556514.jpg?c=original">
 <div class="image__container" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--show-credits": 525}' data-image-variation="image">
 <picture class="image__picture"><source height="438" media="(min-width: 1280px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2242705715-20251025043556514.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="438" media="(min-width: 960px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2242705715-20251025043556514.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="270" media="(min-width: 480px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2242705715-20251025043556514.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><source height="270" media="(max-width: 479px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2242705715-20251025043556514.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><img alt="The USS Gerald R. Ford transits the Strait of Gibraltar on October 1." class="image__dam-img image__dam-img--loading" height="3248" onerror="imageLoadError(this)" onload="this.classList.remove('image__dam-img--loading')" src="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2242705715-20251025043556514.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="4547"/></picture>
 </div>
 <div class="image__metadata"><figcaption class="image__credit">Alyssa Joy/US Navy/Getty Images</figcaption></div>
 </div>
 </div>
 </div>
 </a>
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/27/politics/trump-asia-latin-america">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <span class="container__headline-text" data-editable="headline">MAGA hero hands big win to Trump’s Latin America strategy</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/25/politics/national-guard-troops-cities-grover-cleveland-explainer" data-page="cms.cnn.com/_pages/cmh3mtqye00mu27p7eoegc2o5@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_2@published" data-word-count="819">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/25/politics/national-guard-troops-cities-grover-cleveland-explainer">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Why one of Trump’s favorite presidents sent troops to US cities</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/25/politics/east-wing-demolition-white-house-trump-analysis" data-page="cms.cnn.com/_pages/cmh4ys9mx000026qf5lzdbgeh@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_3@published" data-word-count="958">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/25/politics/east-wing-demolition-white-house-trump-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">The East Wing demolition speaks to Trump’s wrecking-ball presidency</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/23/politics/trump-potential-230-million-doj-payment-analysis" data-page="cms.cnn.com/_pages/cmh2gsw1600iy27p9at1351p7@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_4@published" data-word-count="1349">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/23/politics/trump-potential-230-million-doj-payment-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Trump’s potential $230 million DOJ payment would be astonishing — even for him</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/21/politics/white-house-renovations-history" data-page="cms.cnn.com/_pages/cmh109dd000ed27qohl8fez6i@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_5@published" data-word-count="935">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/21/politics/white-house-renovations-history">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">The new East Wing: Caesars Palace meets the Palace of Versailles</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/22/politics/trump-israel-gaza-vance-ukraine-russia-putin-analysis" data-page="cms.cnn.com/_pages/cmh18jh23009n27p1etzlbfd5@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_6@published" data-word-count="1450">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/22/politics/trump-israel-gaza-vance-ukraine-russia-putin-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Reality catches up with Trump’s Ukraine peace drive and threatens his Mideast push</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/21/politics/government-shutdown-congress-federal-workers-analysis" data-page="cms.cnn.com/_pages/cmgzjr2xx003p27p8dkxn961b@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_7@published" data-word-count="1379">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/21/politics/government-shutdown-congress-federal-workers-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">What’s happening to end the government shutdown? Nothing</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/20/politics/trump-no-kings-protests-vance-cia-analysis" data-page="cms.cnn.com/_pages/cmgybbyl3002y26p2e6bmdhjw@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_8@published" data-word-count="1804">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/20/politics/trump-no-kings-protests-vance-cia-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">Trump’s response to ‘No Kings’ marches only proved the protesters’ point</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/19/politics/crypto-atm-scams-bitcoin-analysis" data-page="cms.cnn.com/_pages/cmgv3d2lb008i26qo8cgz56r7@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmoqpi002v3d6ewn6g7x8f_fill_9@published" data-word-count="1383">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/19/politics/crypto-atm-scams-bitcoin-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__headline-text" data-editable="headline">How CNN tied multiple fraud reports to one single crypto ATM machine</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis" data-page="cms.cnn.com/_pages/cmgtvbek8000026qiaa5hehds@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdn0eoo005c3d6e05uyeqhc_fill_1@published" data-word-count="1185">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis">
 <div class="container__item-media-wrapper container_lead-plus-headlines__item-media-wrapper" data-breakpoints='{"card--media-large": 525, "card--media-extra-large": 660, "card--media-card-label-show": 200}'>
 <div class="container__item-media container_lead-plus-headlines__item-media">
 <div class="image image__hide-placeholder" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--eq-large": 660}' data-component-name="image" data-image-variation="image" data-name="2025-10-15T213937Z_1885558087_RC2KAHAATC3D_RTRMADP_3_USA-VENEZUELA-TRUMP-CIA.JPG" data-observe-resizes="" data-original-height="3261" data-original-ratio="0.6668711656441718" data-original-width="4890" data-unselectable="true" data-uri="cms.cnn.com/_components/image/instances/card_clbdn0eoo005c3d6e05uyeqhc_fill_1@published" data-url="https://media.cnn.com/api/v1/images/stellar/prod/2025-10-15t213937z-1885558087-rc2kahaatc3d-rtrmadp-3-usa-venezuela-trump-cia.JPG?c=original">
 <div class="image__container" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--show-credits": 525}' data-image-variation="image">
 <picture class="image__picture"><source height="438" media="(min-width: 1280px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/2025-10-15t213937z-1885558087-rc2kahaatc3d-rtrmadp-3-usa-venezuela-trump-cia.JPG?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="438" media="(min-width: 960px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/2025-10-15t213937z-1885558087-rc2kahaatc3d-rtrmadp-3-usa-venezuela-trump-cia.JPG?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="270" media="(min-width: 480px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/2025-10-15t213937z-1885558087-rc2kahaatc3d-rtrmadp-3-usa-venezuela-trump-cia.JPG?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><source height="270" media="(max-width: 479px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/2025-10-15t213937z-1885558087-rc2kahaatc3d-rtrmadp-3-usa-venezuela-trump-cia.JPG?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><img alt="Venezuela's President Nicolás Maduro participates in a demonstration to mark Indigenous Resistance Day in Caracas, Venezuela, on October 12." class="image__dam-img image__dam-img--loading" height="3261" onerror="imageLoadError(this)" onload="this.classList.remove('image__dam-img--loading')" src="https://media.cnn.com/api/v1/images/stellar/prod/2025-10-15t213937z-1885558087-rc2kahaatc3d-rtrmadp-3-usa-venezuela-trump-cia.JPG?c=16x9&amp;q=h_438,w_780,c_fill" width="4890"/></picture>
 </div>
 <div class="image__metadata"><figcaption class="image__credit">Leonardo Fernandez Viloria/Reuters</figcaption></div>
 </div>
 </div>
 </div>
 </a>
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <span class="container__headline-text" data-editable="headline">Trump’s moves against Venezuela sound familiar for a reason</span>
 </div>
 </div>
 </a>
 </li>,
 <li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis" data-page="cms.cnn.com/_pages/cmgvd3qgn001r27p28eid89py@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clrpe074t00003b6hswp5rsxi_fill_1@published" data-word-count="2537">
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis">
 <div class="container__item-media-wrapper container_lead-plus-headlines__item-media-wrapper" data-breakpoints='{"card--media-large": 525, "card--media-extra-large": 660, "card--media-card-label-show": 200}'>
 <div class="container__item-media container_lead-plus-headlines__item-media">
 <div class="image image__hide-placeholder" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--eq-large": 660}' data-component-name="image" data-image-variation="image" data-name="01_GettyImages-2235780930.jpg" data-observe-resizes="" data-original-height="1600" data-original-ratio="0.6666666666666666" data-original-width="2400" data-unselectable="true" data-uri="cms.cnn.com/_components/image/instances/card_clrpe074t00003b6hswp5rsxi_fill_1@published" data-url="https://media.cnn.com/api/v1/images/stellar/prod/01-gettyimages-2235780930.jpg?c=original">
 <div class="image__container" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--show-credits": 525}' data-image-variation="image">
 <picture class="image__picture"><source height="438" media="(min-width: 1280px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/01-gettyimages-2235780930.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="438" media="(min-width: 960px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/01-gettyimages-2235780930.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="270" media="(min-width: 480px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/01-gettyimages-2235780930.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><source height="270" media="(max-width: 479px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/01-gettyimages-2235780930.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><img alt="Voters cast ballots at at the Loudon County Office of Elections and Voter Registration in Leesburg, Virginia, on September 19, the first day of early voting in Virginia's gubernatorial election." class="image__dam-img image__dam-img--loading" height="1600" onerror="imageLoadError(this)" onload="this.classList.remove('image__dam-img--loading')" src="https://media.cnn.com/api/v1/images/stellar/prod/01-gettyimages-2235780930.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="2400"/></picture>
 </div>
 <div class="card__label-container">
 <div class="card__label card__label--type-for-subscribers">
 <span class="card__label-icon"></span>
 <span class="card__label-bull-span">
 <span class="card__label-indicator">•</span>
           For Subscribers
         </span>
 <span class="card__label--duration">For Subscribers</span>
 </div>
 </div>
 <div class="image__metadata"><figcaption class="image__credit">Graeme Sloan/Bloomberg/Getty Images</figcaption></div>
 </div>
 </div>
 </div>
 </a>
 <a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis">
 <div class="container__text container_lead-plus-headlines__text">
 <div class="container__headline container_lead-plus-headlines__headline">
 <!-- This needs to be all one one line or it will cause unwanted spacing due to handlebar output -->
 <span class="container__text-label container__text-label--type-for-subscribers container_lead-plus-headlines__text-label container_lead-plus-headlines__text-label--type-for-subscribers"><span class="container__text-label--icon"></span><span class="container__text-label--text-content">For Subscribers</span></span>
 <span class="container__headline-text" data-editable="headline">These are the signals to watch from the New Jersey and Virginia governor races</span>
 </div>
 </div>
 </a>
 </li>]

links[0]

<li class="card container__item container__item--type-media-image container__item--type-section container_lead-plus-headlines__item container_lead-plus-headlines__item--type-section container_lead-plus-headlines__selected" data-component-name="card" data-created-updated-by="true" data-open-link="/2025/10/21/politics/va-therapists-treatment-sessions-limited" data-page="cms.cnn.com/_pages/cmgzg04d1004326p2g8ur5dmw@published" data-unselectable="true" data-uri="cms.cnn.com/_components/card/instances/clbdmol44002m3d6euos0g5zg_fill_1@published" data-word-count="1741">
<a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/21/politics/va-therapists-treatment-sessions-limited">
<div class="container__item-media-wrapper container_lead-plus-headlines__item-media-wrapper" data-breakpoints='{"card--media-large": 525, "card--media-extra-large": 660, "card--media-card-label-show": 200}'>
<div class="container__item-media container_lead-plus-headlines__item-media">
<div class="image image__hide-placeholder" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--eq-large": 660}' data-component-name="image" data-image-variation="image" data-name="GettyImages-2197525781.jpg" data-observe-resizes="" data-original-height="1600" data-original-ratio="0.6666666666666666" data-original-width="2400" data-unselectable="true" data-uri="cms.cnn.com/_components/image/instances/card_clbdmol44002m3d6euos0g5zg_fill_1@published" data-url="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=original">
<script>function imageLoadError(img) {
    const fallbackImage = '/media/sites/cnn/cnn-fallback-image.jpg';

    img.removeAttribute('onerror');
    img.src = fallbackImage;
    let element = img.previousElementSibling;

    while (element && element.tagName === 'SOURCE') {
      element.srcset = fallbackImage;
      element = element.previousElementSibling;
    }

    img.dataset?.imgCssVars.split(',').forEach((property) => {
      img.style.removeProperty(property);
    });
  }</script>
<div class="image__container" data-breakpoints='{"image--eq-extra-small": 115, "image--eq-small": 300, "image--show-credits": 525}' data-image-variation="image">
<picture class="image__picture"><source height="438" media="(min-width: 1280px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="438" media="(min-width: 960px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="780"/><source height="270" media="(min-width: 480px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><source height="270" media="(max-width: 479px)" srcset="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_270,w_480,c_fill" width="480"/><img alt="The Department of Veterans Affairs headquarters in Washington, DC, on February 7, 2025." class="image__dam-img image__dam-img--loading" height="1600" onerror="imageLoadError(this)" onload="this.classList.remove('image__dam-img--loading')" src="https://media.cnn.com/api/v1/images/stellar/prod/gettyimages-2197525781.jpg?c=16x9&amp;q=h_438,w_780,c_fill" width="2400"/></picture>
</div>
<div class="image__metadata"><figcaption class="image__credit">Stefani Reynolds/Bloomberg/Getty Images/File</figcaption></div>
</div>
</div>
</div>
</a>
<a class="container__link container__link--type-article container_lead-plus-headlines__link" data-link-type="article" href="/2025/10/21/politics/va-therapists-treatment-sessions-limited">
<div class="container__text container_lead-plus-headlines__text">
<div class="container__headline container_lead-plus-headlines__headline">
<span class="container__headline-text" data-editable="headline">Some VA therapists and patients say their treatment sessions are being limited, raising alarm</span>
</div>
</div>
</a>
</li>

links[0].attrs["data-open-link"]

'/2025/10/21/politics/va-therapists-treatment-sessions-limited'

# grab links
links_from_cnn = []

# iterate
for link in links:
    links_from_cnn.append(link["data-open-link"])
    
# print
print(links_from_cnn)

['/2025/10/21/politics/va-therapists-treatment-sessions-limited', '/2025/10/27/politics/joe-biden-kennedy-award', '/2025/10/27/politics/donald-trump-mri-health-walter-reed', '/2025/10/27/politics/mike-braun-indiana-redistricting', '/2025/10/27/politics/nuclear-weapons-shutdown-delay-trump-nnsa', '/2025/10/27/politics/government-shutdown-snap-food-stamps', '/2025/10/27/politics/sean-duffy-nasa-transportation', '/2025/10/26/politics/navy-aircraft-crash-south-china-sea', '/2025/10/27/politics/trump-asia-latin-america', '/2025/10/25/politics/national-guard-troops-cities-grover-cleveland-explainer', '/2025/10/25/politics/east-wing-demolition-white-house-trump-analysis', '/2025/10/23/politics/trump-potential-230-million-doj-payment-analysis', '/2025/10/21/politics/white-house-renovations-history', '/2025/10/22/politics/trump-israel-gaza-vance-ukraine-russia-putin-analysis', '/2025/10/21/politics/government-shutdown-congress-federal-workers-analysis', '/2025/10/20/politics/trump-no-kings-protests-vance-cia-analysis', '/2025/10/19/politics/crypto-atm-scams-bitcoin-analysis', '/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis', '/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis']

## another way to do this is by using href attributes of a tag
links_from_cnn = []

# Extract relevant and unique links
for tag in soup.find_all("a"):
    href = tag.attrs.get("href")
    links_from_cnn.append(href)

# much more extensive set of links
print(links_from_cnn)

['https://www.cnn.com', 'https://www.cnn.com/politics', 'https://www.cnn.com/politics/president-donald-trump-47', 'https://www.cnn.com/politics/fact-check', 'https://www.cnn.com/polling', 'https://www.cnn.com/election/2025', None, 'https://www.cnn.com/politics/president-donald-trump-47', 'https://www.cnn.com/politics/fact-check', 'https://www.cnn.com/polling', 'https://www.cnn.com/election/2025', 'https://www.cnn.com/video', 'https://www.cnn.com/audio', 'https://www.cnn.com/live-tv', '/account/settings', '/newsletters', '/follow?iid=fw_var-nav', '#', '#', '/account/settings', '/newsletters', '/follow?iid=fw_var-nav', '#', '#', 'https://www.cnn.com/live-tv', 'https://www.cnn.com/audio', 'https://www.cnn.com/video', 'https://us.cnn.com?hpt=header_edition-picker', 'https://edition.cnn.com?hpt=header_edition-picker', 'https://arabic.cnn.com?hpt=header_edition-picker', 'https://cnnespanol.cnn.com/?hpt=header_edition-picker', 'https://us.cnn.com?hpt=header_edition-picker', 'https://edition.cnn.com?hpt=header_edition-picker', 'https://arabic.cnn.com?hpt=header_edition-picker', 'https://cnnespanol.cnn.com/?hpt=header_edition-picker', 'https://www.cnn.com/politics/president-donald-trump-47', 'https://www.cnn.com/politics/fact-check', 'https://www.cnn.com/polling', 'https://www.cnn.com/election/2025', 'https://facebook.com/cnnpolitics', 'https://twitter.com/cnnpolitics', 'https://instagram.com/cnnpolitics', 'https://www.cnn.com/us', 'https://www.cnn.com/us/crime-and-justice', 'https://www.cnn.com/world', 'https://www.cnn.com/world/africa', 'https://www.cnn.com/world/americas', 'https://www.cnn.com/world/asia', 'https://www.cnn.com/world/australia', 'https://www.cnn.com/world/china', 'https://www.cnn.com/world/europe', 'https://www.cnn.com/world/india', 'https://www.cnn.com/world/middle-east', 'https://www.cnn.com/world/united-kingdom', 'https://www.cnn.com/politics', 'https://www.cnn.com/politics/president-donald-trump-47', 'https://www.cnn.com/politics/fact-check', 'https://www.cnn.com/polling', 'https://www.cnn.com/election/2025', 'https://www.cnn.com/business', 'https://www.cnn.com/business/tech', 'https://www.cnn.com/business/media', 'https://www.cnn.com/business/financial-calculators', 'https://www.cnn.com/business/videos', 'https://www.cnn.com/markets', 'https://www.cnn.com/markets/premarkets', 'https://www.cnn.com/markets/after-hours', 'https://www.cnn.com/markets/fear-and-greed', 'https://www.cnn.com/business/investing', 'https://www.cnn.com/business/markets-now', 'https://www.cnn.com/business/markets/nightcap', 'https://www.cnn.com/health', 'https://www.cnn.com/interactive/life-but-better/', 'https://www.cnn.com/health/life-but-better/fitness', 'https://www.cnn.com/health/life-but-better/food', 'https://www.cnn.com/health/life-but-better/sleep', 'https://www.cnn.com/health/life-but-better/mindfulness', 'https://www.cnn.com/health/life-but-better/relationships', 'https://www.cnn.com/cnn-underscored', 'https://www.cnn.com/cnn-underscored/electronics', 'https://www.cnn.com/cnn-underscored/fashion', 'https://www.cnn.com/cnn-underscored/beauty', 'https://www.cnn.com/cnn-underscored/health-fitness', 'https://www.cnn.com/cnn-underscored/home', 'https://www.cnn.com/cnn-underscored/reviews', 'https://www.cnn.com/cnn-underscored/deals', 'https://www.cnn.com/cnn-underscored/gifts', 'https://www.cnn.com/cnn-underscored/travel', 'https://www.cnn.com/cnn-underscored/outdoors', 'https://www.cnn.com/cnn-underscored/pets', 'https://www.cnn.com/entertainment', 'https://www.cnn.com/entertainment/movies', 'https://www.cnn.com/entertainment/tv-shows', 'https://www.cnn.com/entertainment/celebrities', 'https://www.cnn.com/business/tech', 'https://www.cnn.com/business/tech/innovate', 'https://www.cnn.com/business/tech/foreseeable-future', 'https://www.cnn.com/business/tech/mission-ahead', 'https://www.cnn.com/business/work-transformed', 'https://www.cnn.com/business/tech/innovative-cities', 'https://www.cnn.com/style', 'https://www.cnn.com/style/arts', 'https://www.cnn.com/style/design', 'https://www.cnn.com/style/fashion', 'https://www.cnn.com/style/architecture', 'https://www.cnn.com/style/luxury', 'https://www.cnn.com/style/beauty', 'https://www.cnn.com/style/videos', 'https://www.cnn.com/travel', 'https://www.cnn.com/travel/destinations', 'https://www.cnn.com/travel/food-and-drink', 'https://www.cnn.com/travel/stay', 'https://www.cnn.com/travel/videos', 'https://www.cnn.com/sports', 'https://bleacherreport.com/nfl', 'https://bleacherreport.com/college-football', 'https://bleacherreport.com/nba', 'https://bleacherreport.com/mlb', 'https://bleacherreport.com/world-football', 'https://www.cnn.com/sport/paris-olympics-2024', 'https://bleacherreport.com/nhl', 'https://www.cnn.com/science', 'https://www.cnn.com/science/space', 'https://www.cnn.com/science/life', 'https://www.cnn.com/science/unearthed', 'https://www.cnn.com/climate', 'https://www.cnn.com/climate/solutions', 'https://www.cnn.com/weather', 'https://www.cnn.com/weather', 'https://www.cnn.com/weather/video', 'https://www.cnn.com/climate', 'https://www.cnn.com/world/europe/ukraine', 'https://www.cnn.com/world/middleeast/israel', 'https://www.cnn.com/videos', 'https://www.cnn.com/live-tv', 'https://www.cnn.com/videos/fast/cnn-headlines', 'https://www.cnn.com/shorts/cnn-shorts', 'https://www.cnn.com/tv/all-shows', 'https://www.cnn.com/cnn10', 'https://www.max.com/channel/cnn-max', 'https://www.cnn.com/tv/schedule/cnn', 'https://www.cnn.com/subscription/video/flashdocs/library', 'https://www.cnn.com/audio', 'https://www.cnn.com/audio/podcasts/5-things', 'https://www.cnn.com/audio/podcasts/chasing-life', 'https://www.cnn.com/audio/podcasts/the-assignment', 'https://www.cnn.com/audio/podcasts/one-thing', 'https://www.cnn.com/audio/podcasts/tug-of-war', 'https://www.cnn.com/audio/podcasts/political-briefing', 'https://www.cnn.com/audio/podcasts/axe-files', 'https://www.cnn.com/audio/podcasts/all-there-is-with-anderson-cooper', 'https://www.cnn.com/audio', 'https://www.cnn.com/games', 'https://www.cnn.com/games/play/cnn-crossword', 'https://www.cnn.com/games/play/jumble-crossword-daily', 'https://www.cnn.com/games/play/photo-shuffle', 'https://www.cnn.com/games/play/sudoblock', 'https://www.cnn.com/games/play/daily-sudoku', 'https://cnn.it/5thingsquiz', 'https://www.cnn.com/about', 'https://www.cnn.com/subscription?source=sub_web_footersubnav-link', 'https://www.cnn.com/world/photos', 'https://www.cnn.com/us/cnn-investigates', 'https://www.cnn.com/profiles', 'https://www.cnn.com/profiles/cnn-leadership', 'https://www.cnn.com/newsletters', 'https://careers.wbd.com/cnnjobs', '/2025/10/21/politics/va-therapists-treatment-sessions-limited', '/2025/10/21/politics/va-therapists-treatment-sessions-limited', '/2025/10/27/politics/joe-biden-kennedy-award', '/2025/10/27/politics/donald-trump-mri-health-walter-reed', '/2025/10/27/politics/mike-braun-indiana-redistricting', '/2025/10/27/politics/nuclear-weapons-shutdown-delay-trump-nnsa', '/2025/10/27/politics/government-shutdown-snap-food-stamps', '/2025/10/27/politics/sean-duffy-nasa-transportation', '/2025/10/26/politics/navy-aircraft-crash-south-china-sea', '/2025/10/27/politics/trump-asia-latin-america', '/2025/10/27/politics/trump-asia-latin-america', '/2025/10/25/politics/national-guard-troops-cities-grover-cleveland-explainer', '/2025/10/25/politics/east-wing-demolition-white-house-trump-analysis', '/2025/10/23/politics/trump-potential-230-million-doj-payment-analysis', '/2025/10/21/politics/white-house-renovations-history', '/2025/10/22/politics/trump-israel-gaza-vance-ukraine-russia-putin-analysis', '/2025/10/21/politics/government-shutdown-congress-federal-workers-analysis', '/2025/10/20/politics/trump-no-kings-protests-vance-cia-analysis', '/2025/10/19/politics/crypto-atm-scams-bitcoin-analysis', '/2025/10/27/politics/video/china-us-trump-xi-trade-deal-rare-earth-digvid', '/2025/10/27/politics/video/china-us-trump-xi-trade-deal-rare-earth-digvid', '/2025/10/26/politics/video/nr-emily-ngo-mamdani-nyc-mayoral-race', '/2025/10/26/politics/video/nr-emily-ngo-mamdani-nyc-mayoral-race', 'https://www.cnn.com/specials/politics/all-over-the-map', '/2025/09/18/politics/texas-voters-redistricting-gop-democrats-aotm', '/2025/09/18/politics/texas-voters-redistricting-gop-democrats-aotm', '/2025/08/06/politics/arizona-gop-trump-budget-bill-economy-impacts-aotm', '/2025/08/06/politics/arizona-gop-trump-budget-bill-economy-impacts-aotm', 'https://www.cnn.com/politics/fact-check', '/2025/10/24/politics/fact-check-ballroom-press-secretary', '/2025/10/24/politics/fact-check-ballroom-press-secretary', '/2025/10/24/politics/fact-check-grocery-prices-trump', '/2025/10/24/politics/fact-check-grocery-prices-trump', 'https://www.cnn.com/politics/kfile', '/2025/09/30/politics/lindsey-halligan-comey-political-prosecution-kfile', '/2025/09/30/politics/lindsey-halligan-comey-political-prosecution-kfile', '/2025/08/12/politics/trump-epstein-relationship-timeline-invs-vis', '/2025/08/12/politics/trump-epstein-relationship-timeline-invs-vis', '/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis', '/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis', '/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis', '/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis', '/2025/10/27/politics/virginia-democrats-redistricting-spanberger', '/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc', '/2025/10/26/politics/obamacare-aca-enhanced-subsidies-government-shutdown', '/2025/10/26/politics/jessica-tisch-zohran-mamdani-election', '/2025/10/26/politics/newsom-consider-presidential-run', '/2025/10/26/politics/schwarzenegger-redistricting-oppose', '/2025/10/26/politics/jack-ciattarelli-mikie-sherrill-new-jersey-governor-race', '/2025/10/26/politics/health-care-premiums-obamacare-republicans-shutdown', '/2025/10/26/politics/mccormick-defends-trump-praises-fetterman', '/2025/10/26/politics/white-house-east-wing-history', '/2025/10/25/politics/trump-asia-trip-leaders-intl-hnk', '/2025/10/25/politics/cuomo-mamdani-jackson-heights-nyc-mayoral-debate', '/2025/10/25/politics/furloughed-federal-workers-unemployment-claims', '/2025/10/25/politics/kamala-harris-presidential-run', '/2025/10/25/politics/reagan-foundation-trump-tariffs', '/2025/10/25/politics/adelita-grijalva-arizona-mike-johnson-epstein', '/2025/10/24/politics/usda-contingency-fund-november-snap', '/2025/10/24/politics/doj-monitor-polling-sites-california-new-jersey', '/2025/10/24/politics/zohran-mamdani-islam-cuomo', '/2025/10/24/politics/graham-platner-nazi-tattoo-evidence-kfile-invs', '/2025/10/24/politics/anonymous-donor-military-pay-shutdown', 'https://cnn.it/49SplEC', 'https://cnn.it/49SplEC', 'https://www.cnn.com/interactive/2023/07/politics/trump-indictments-criminal-cases/', 'https://www.cnn.com/interactive/2023/07/politics/trump-indictments-criminal-cases/', 'https://www.cnn.com/interactive/2024/politics/2020-2016-exit-polls-2024-dg/', 'https://www.cnn.com/interactive/2024/politics/2020-2016-exit-polls-2024-dg/', '/2021/01/08/politics/gallery/donald-trump', '/2021/01/08/politics/gallery/donald-trump', '/2021/01/08/politics/gallery/joe-biden', '/2021/01/08/politics/gallery/joe-biden', '/2025/06/14/politics/gallery/trump-parade-military-photos', '/2025/06/14/politics/gallery/trump-parade-military-photos', '/account/settings', '/newsletters', '/follow?iid=fw_var-nav', '#', '#', 'https://www.cnn.com/live-tv', 'https://www.cnn.com/audio', 'https://www.cnn.com/video', 'https://www.cnn.com/us', 'https://www.cnn.com/us/crime-and-justice', 'https://www.cnn.com/world', 'https://www.cnn.com/world/africa', 'https://www.cnn.com/world/americas', 'https://www.cnn.com/world/asia', 'https://www.cnn.com/world/australia', 'https://www.cnn.com/world/china', 'https://www.cnn.com/world/europe', 'https://www.cnn.com/world/india', 'https://www.cnn.com/world/middle-east', 'https://www.cnn.com/world/united-kingdom', 'https://www.cnn.com/politics', 'https://www.cnn.com/politics/president-donald-trump-47', 'https://www.cnn.com/politics/fact-check', 'https://www.cnn.com/polling', 'https://www.cnn.com/election/2025', 'https://www.cnn.com/business', 'https://www.cnn.com/business/tech', 'https://www.cnn.com/business/media', 'https://www.cnn.com/business/financial-calculators', 'https://www.cnn.com/business/videos', 'https://www.cnn.com/markets', 'https://www.cnn.com/markets/premarkets', 'https://www.cnn.com/markets/after-hours', 'https://www.cnn.com/markets/fear-and-greed', 'https://www.cnn.com/business/investing', 'https://www.cnn.com/business/markets-now', 'https://www.cnn.com/business/markets/nightcap', 'https://www.cnn.com/health', 'https://www.cnn.com/interactive/life-but-better/', 'https://www.cnn.com/health/life-but-better/fitness', 'https://www.cnn.com/health/life-but-better/food', 'https://www.cnn.com/health/life-but-better/sleep', 'https://www.cnn.com/health/life-but-better/mindfulness', 'https://www.cnn.com/health/life-but-better/relationships', 'https://www.cnn.com/cnn-underscored', 'https://www.cnn.com/cnn-underscored/electronics', 'https://www.cnn.com/cnn-underscored/fashion', 'https://www.cnn.com/cnn-underscored/beauty', 'https://www.cnn.com/cnn-underscored/health-fitness', 'https://www.cnn.com/cnn-underscored/home', 'https://www.cnn.com/cnn-underscored/reviews', 'https://www.cnn.com/cnn-underscored/deals', 'https://www.cnn.com/cnn-underscored/gifts', 'https://www.cnn.com/cnn-underscored/travel', 'https://www.cnn.com/cnn-underscored/outdoors', 'https://www.cnn.com/cnn-underscored/pets', 'https://www.cnn.com/entertainment', 'https://www.cnn.com/entertainment/movies', 'https://www.cnn.com/entertainment/tv-shows', 'https://www.cnn.com/entertainment/celebrities', 'https://www.cnn.com/business/tech', 'https://www.cnn.com/business/tech/innovate', 'https://www.cnn.com/business/tech/foreseeable-future', 'https://www.cnn.com/business/tech/mission-ahead', 'https://www.cnn.com/business/work-transformed', 'https://www.cnn.com/business/tech/innovative-cities', 'https://www.cnn.com/style', 'https://www.cnn.com/style/arts', 'https://www.cnn.com/style/design', 'https://www.cnn.com/style/fashion', 'https://www.cnn.com/style/architecture', 'https://www.cnn.com/style/luxury', 'https://www.cnn.com/style/beauty', 'https://www.cnn.com/style/videos', 'https://www.cnn.com/travel', 'https://www.cnn.com/travel/destinations', 'https://www.cnn.com/travel/food-and-drink', 'https://www.cnn.com/travel/stay', 'https://www.cnn.com/travel/news', 'https://www.cnn.com/travel/videos', 'https://www.cnn.com/sports', 'https://bleacherreport.com/nfl', 'https://bleacherreport.com/college-football', 'https://bleacherreport.com/nba', 'https://bleacherreport.com/mlb', 'https://bleacherreport.com/world-football', 'https://www.cnn.com/sport/paris-olympics-2024', 'https://bleacherreport.com/nhl', 'https://www.cnn.com/science', 'https://www.cnn.com/science/space', 'https://www.cnn.com/science/life', 'https://www.cnn.com/science/unearthed', 'https://www.cnn.com/climate', 'https://www.cnn.com/climate/solutions', 'https://www.cnn.com/weather', 'https://www.cnn.com/weather', 'https://www.cnn.com/weather/video', 'https://www.cnn.com/climate', 'https://www.cnn.com/world/europe/ukraine', 'https://www.cnn.com/world/middleeast/israel', 'https://www.cnn.com/videos', 'https://www.cnn.com/live-tv', 'https://www.cnn.com/videos/fast/cnn-headlines', 'https://www.cnn.com/shorts/cnn-shorts', 'https://www.cnn.com/tv/all-shows', 'https://www.cnn.com/cnn10', 'https://www.max.com/channel/cnn-max', 'https://www.cnn.com/tv/schedule/cnn', 'https://www.cnn.com/subscription/video/flashdocs/library', 'https://www.cnn.com/audio', 'https://www.cnn.com/audio/podcasts/5-things', 'https://www.cnn.com/audio/podcasts/chasing-life', 'https://www.cnn.com/audio/podcasts/the-assignment', 'https://www.cnn.com/audio/podcasts/one-thing', 'https://www.cnn.com/audio/podcasts/tug-of-war', 'https://www.cnn.com/audio/podcasts/political-briefing', 'https://www.cnn.com/audio/podcasts/axe-files', 'https://www.cnn.com/audio/podcasts/all-there-is-with-anderson-cooper', 'https://www.cnn.com/audio', 'https://www.cnn.com/games', 'https://www.cnn.com/games/play/cnn-crossword', 'https://www.cnn.com/games/play/jumble-crossword-daily', 'https://www.cnn.com/games/play/photo-shuffle', 'https://www.cnn.com/games/play/sudoblock', 'https://www.cnn.com/games/play/daily-sudoku', 'https://cnn.it/5thingsquiz', 'https://www.cnn.com/about', 'https://www.cnn.com/subscription?source=sub_web_footersubnav-link', 'https://www.cnn.com/world/photos', 'https://www.cnn.com/us/cnn-investigates', 'https://www.cnn.com/profiles', 'https://www.cnn.com/profiles/cnn-leadership', 'https://www.cnn.com/newsletters', 'https://careers.wbd.com/cnnjobs', 'https://www.cnn.com', 'https://www.cnn.com/politics', 'https://www.cnn.com/video', 'https://www.cnn.com/audio', 'https://www.cnn.com/live-tv', 'https://facebook.com/cnnpolitics', 'https://twitter.com/cnnpolitics', 'https://instagram.com/cnnpolitics', 'https://youtube.com/user/CNN', '/account/settings', '/newsletters', '/follow?iid=fw_var-nav', '#', '#', 'https://www.cnn.com/terms', 'https://www.cnn.com/privacy', '#', 'https://www.cnn.com/ad-choices', 'https://www.cnn.com/accessibility', 'https://www.cnn.com/about', 'https://www.cnn.com/subscription?source=sub_web_footerlink-link', 'https://www.cnn.com/newsletters', 'https://www.cnn.com/transcripts', 'https://help.cnn.com/']

import re
## clean the output. 
# Keep only stories that starts with "/" and "fourdigits"
# combine with the base url
links_from_cnn_reduced = ["https://www.cnn.com" + l for l in links_from_cnn if re.match(r'^/(\d{4})', str(l))]
links_from_cnn_reduced

['https://www.cnn.com/2025/10/21/politics/va-therapists-treatment-sessions-limited',
 'https://www.cnn.com/2025/10/21/politics/va-therapists-treatment-sessions-limited',
 'https://www.cnn.com/2025/10/27/politics/joe-biden-kennedy-award',
 'https://www.cnn.com/2025/10/27/politics/donald-trump-mri-health-walter-reed',
 'https://www.cnn.com/2025/10/27/politics/mike-braun-indiana-redistricting',
 'https://www.cnn.com/2025/10/27/politics/nuclear-weapons-shutdown-delay-trump-nnsa',
 'https://www.cnn.com/2025/10/27/politics/government-shutdown-snap-food-stamps',
 'https://www.cnn.com/2025/10/27/politics/sean-duffy-nasa-transportation',
 'https://www.cnn.com/2025/10/26/politics/navy-aircraft-crash-south-china-sea',
 'https://www.cnn.com/2025/10/27/politics/trump-asia-latin-america',
 'https://www.cnn.com/2025/10/27/politics/trump-asia-latin-america',
 'https://www.cnn.com/2025/10/25/politics/national-guard-troops-cities-grover-cleveland-explainer',
 'https://www.cnn.com/2025/10/25/politics/east-wing-demolition-white-house-trump-analysis',
 'https://www.cnn.com/2025/10/23/politics/trump-potential-230-million-doj-payment-analysis',
 'https://www.cnn.com/2025/10/21/politics/white-house-renovations-history',
 'https://www.cnn.com/2025/10/22/politics/trump-israel-gaza-vance-ukraine-russia-putin-analysis',
 'https://www.cnn.com/2025/10/21/politics/government-shutdown-congress-federal-workers-analysis',
 'https://www.cnn.com/2025/10/20/politics/trump-no-kings-protests-vance-cia-analysis',
 'https://www.cnn.com/2025/10/19/politics/crypto-atm-scams-bitcoin-analysis',
 'https://www.cnn.com/2025/10/27/politics/video/china-us-trump-xi-trade-deal-rare-earth-digvid',
 'https://www.cnn.com/2025/10/27/politics/video/china-us-trump-xi-trade-deal-rare-earth-digvid',
 'https://www.cnn.com/2025/10/26/politics/video/nr-emily-ngo-mamdani-nyc-mayoral-race',
 'https://www.cnn.com/2025/10/26/politics/video/nr-emily-ngo-mamdani-nyc-mayoral-race',
 'https://www.cnn.com/2025/09/18/politics/texas-voters-redistricting-gop-democrats-aotm',
 'https://www.cnn.com/2025/09/18/politics/texas-voters-redistricting-gop-democrats-aotm',
 'https://www.cnn.com/2025/08/06/politics/arizona-gop-trump-budget-bill-economy-impacts-aotm',
 'https://www.cnn.com/2025/08/06/politics/arizona-gop-trump-budget-bill-economy-impacts-aotm',
 'https://www.cnn.com/2025/10/24/politics/fact-check-ballroom-press-secretary',
 'https://www.cnn.com/2025/10/24/politics/fact-check-ballroom-press-secretary',
 'https://www.cnn.com/2025/10/24/politics/fact-check-grocery-prices-trump',
 'https://www.cnn.com/2025/10/24/politics/fact-check-grocery-prices-trump',
 'https://www.cnn.com/2025/09/30/politics/lindsey-halligan-comey-political-prosecution-kfile',
 'https://www.cnn.com/2025/09/30/politics/lindsey-halligan-comey-political-prosecution-kfile',
 'https://www.cnn.com/2025/08/12/politics/trump-epstein-relationship-timeline-invs-vis',
 'https://www.cnn.com/2025/08/12/politics/trump-epstein-relationship-timeline-invs-vis',
 'https://www.cnn.com/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis',
 'https://www.cnn.com/2025/10/17/politics/cia-venezuela-trump-castro-noriega-analysis',
 'https://www.cnn.com/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis',
 'https://www.cnn.com/2025/10/19/politics/virginia-new-jersey-governor-elections-trump-analysis',
 'https://www.cnn.com/2025/10/27/politics/virginia-democrats-redistricting-spanberger',
 'https://www.cnn.com/2025/10/26/politics/mamdani-sanders-aoc-rally-nyc',
 'https://www.cnn.com/2025/10/26/politics/obamacare-aca-enhanced-subsidies-government-shutdown',
 'https://www.cnn.com/2025/10/26/politics/jessica-tisch-zohran-mamdani-election',
 'https://www.cnn.com/2025/10/26/politics/newsom-consider-presidential-run',
 'https://www.cnn.com/2025/10/26/politics/schwarzenegger-redistricting-oppose',
 'https://www.cnn.com/2025/10/26/politics/jack-ciattarelli-mikie-sherrill-new-jersey-governor-race',
 'https://www.cnn.com/2025/10/26/politics/health-care-premiums-obamacare-republicans-shutdown',
 'https://www.cnn.com/2025/10/26/politics/mccormick-defends-trump-praises-fetterman',
 'https://www.cnn.com/2025/10/26/politics/white-house-east-wing-history',
 'https://www.cnn.com/2025/10/25/politics/trump-asia-trip-leaders-intl-hnk',
 'https://www.cnn.com/2025/10/25/politics/cuomo-mamdani-jackson-heights-nyc-mayoral-debate',
 'https://www.cnn.com/2025/10/25/politics/furloughed-federal-workers-unemployment-claims',
 'https://www.cnn.com/2025/10/25/politics/kamala-harris-presidential-run',
 'https://www.cnn.com/2025/10/25/politics/reagan-foundation-trump-tariffs',
 'https://www.cnn.com/2025/10/25/politics/adelita-grijalva-arizona-mike-johnson-epstein',
 'https://www.cnn.com/2025/10/24/politics/usda-contingency-fund-november-snap',
 'https://www.cnn.com/2025/10/24/politics/doj-monitor-polling-sites-california-new-jersey',
 'https://www.cnn.com/2025/10/24/politics/zohran-mamdani-islam-cuomo',
 'https://www.cnn.com/2025/10/24/politics/graham-platner-nazi-tattoo-evidence-kfile-invs',
 'https://www.cnn.com/2025/10/24/politics/anonymous-donor-military-pay-shutdown',
 'https://www.cnn.com/2021/01/08/politics/gallery/donald-trump',
 'https://www.cnn.com/2021/01/08/politics/gallery/donald-trump',
 'https://www.cnn.com/2021/01/08/politics/gallery/joe-biden',
 'https://www.cnn.com/2021/01/08/politics/gallery/joe-biden',
 'https://www.cnn.com/2025/06/14/politics/gallery/trump-parade-military-photos',
 'https://www.cnn.com/2025/06/14/politics/gallery/trump-parade-military-photos']

## Step 4: Build a scraper -- generalize you code into a function.

# Let's write the above as a single function
def collect_links_cnn(url=None):
    """Scrape multiple CNN URLS.

    Args:
        url (list): list of valid CNN page to collect links.
    Returns:
        DataFrame: frame containing headline, date, and content fields
    """
    
    # Get access to the website
    page = requests.get(url)
    # create an bs object.
    soup = BeautifulSoup(page.content, 'html.parser') # input 1: request content; input 2: tell you need an html parser

    ## another way to do this is by using href attributes of a tag
    links_from_cnn = []

    # Extract relevant and unique links
    for tag in soup.find_all("a"):
        href = tag.attrs.get("href")
        links_from_cnn.append(href)
        
    ## clean the output. 
    # Keep only stories that starts with "/" and "fourdigits"
    # combine with the base url
    links_from_cnn_reduced = ["https://www.cnn.com" + l for l in links_from_cnn if re.match(r'^/(\d{4})', str(l))]
    links_from_cnn_reduced

    return links_from_cnn_reduced

links_cnn = collect_links_cnn("https://www.cnn.com/politics")
links_cnn[:10]

['https://www.cnn.com/2025/10/21/politics/va-therapists-treatment-sessions-limited',
 'https://www.cnn.com/2025/10/21/politics/va-therapists-treatment-sessions-limited',
 'https://www.cnn.com/2025/10/27/politics/joe-biden-kennedy-award',
 'https://www.cnn.com/2025/10/27/politics/donald-trump-mri-health-walter-reed',
 'https://www.cnn.com/2025/10/27/politics/mike-braun-indiana-redistricting',
 'https://www.cnn.com/2025/10/27/politics/nuclear-weapons-shutdown-delay-trump-nnsa',
 'https://www.cnn.com/2025/10/27/politics/government-shutdown-snap-food-stamps',
 'https://www.cnn.com/2025/10/27/politics/sean-duffy-nasa-transportation',
 'https://www.cnn.com/2025/10/26/politics/navy-aircraft-crash-south-china-sea',
 'https://www.cnn.com/2025/10/27/politics/trump-asia-latin-america']

With this list, you can apply your scrapper function to multiple links:

len(links_cnn)

66

# let's get some cases
links_cnn_ = links_cnn[10:15]

# Then just loop through and collect
scraped_data = []

for url in links_cnn_:

    # check what is going on
    print(url)
    
    # Scrape the content
    scraped_data.append(cnn_scraper(url))

    # Put the system to sleep for a random draw of time (be kind)
    time.sleep(random.uniform(.5,3))

# save as pandas df   
# Organize as a pandas data frame
dat = pd.DataFrame(scraped_data)
dat.head()

https://www.cnn.com/2025/10/27/politics/trump-asia-latin-america
https://www.cnn.com/2025/10/25/politics/national-guard-troops-cities-grover-cleveland-explainer
https://www.cnn.com/2025/10/25/politics/east-wing-demolition-white-house-trump-analysis
https://www.cnn.com/2025/10/23/politics/trump-potential-230-million-doj-payment-analysis
https://www.cnn.com/2025/10/21/politics/white-house-renovations-history

# add an error
links_cnn_ = links_cnn[10:15]
links_cnn_.append("https://www.latinnews.com/component/k2/item/107755.html?period=2025&archive=3&Itemid=6&cat_id=837700:in-brief-brazil-s-current-account-deficit-widens-in-september")

# run the loop in a secure setup
# Then just loop through and collect
scraped_data = []
list_of_errors = []

for url in links_cnn_:

    # check what is going on
    print(url)
    
    # Scrape the content
    try: 
        scraped_data.append(cnn_scraper(url))

        # Put the system to sleep for a random draw of time (be kind)
        time.sleep(random.uniform(.5,3))
    except Exception as e:
        list_of_errors.append([url, e])

https://www.cnn.com/2025/10/27/politics/trump-asia-latin-america
https://www.cnn.com/2025/10/25/politics/national-guard-troops-cities-grover-cleveland-explainer
https://www.cnn.com/2025/10/25/politics/east-wing-demolition-white-house-trump-analysis
https://www.cnn.com/2025/10/23/politics/trump-potential-230-million-doj-payment-analysis
https://www.cnn.com/2025/10/21/politics/white-house-renovations-history
https://www.latinnews.com/component/k2/item/107755.html?period=2025&archive=3&Itemid=6&cat_id=837700:in-brief-brazil-s-current-account-deficit-widens-in-september

dat = pd.DataFrame(scraped_data)
dat.head()

list_of_errors

[['https://www.latinnews.com/component/k2/item/107755.html?period=2025&archive=3&Itemid=6&cat_id=837700:in-brief-brazil-s-current-account-deficit-widens-in-september',
  IndexError('list index out of range')]]

Practice II with Latin News Website¶

Now, return to https://www.latinnews.com/latinnews-country-database.html?country=2156

Do the following:

Build a function encapsulating your code from the Practice I
Build another function to collect all links from Brazil in Latin News
Scrape all news for a single country using these two functions

# setup
import pandas as pd
import requests # For downloading the website
from bs4 import BeautifulSoup # For parsing the website
import time # To put the system to sleep
import random # for random numbers
import re # regular expressions

!jupyter nbconvert _week-07_scraping_static.ipynb --to html --template classic

[NbConvertApp] Converting notebook _week-07_scraping_static.ipynb to html
[NbConvertApp] Writing 496814 bytes to _week-07_scraping_static.html

	url	story_title	story_date	text
0	https://www.cnn.com/2025/10/26/politics/mamdan...	Mamdani rallies with Sanders and Ocasio-Cortez...	Updated Oct 26, 2025, 10:33 PM ET\nPublished O...	As New Yorkers cast their ballots in the city’...
1	https://www.cnn.com/2025/10/26/politics/jessic...	Mamdani takes a risk, courts NYPD Commissioner...	Updated Oct 26, 2025, 1:56 PM ET\nPublished Oc...	One of the most high-profile job interviews in...
2	https://www.cnn.com/2025/10/24/politics/fact-c...	Fact check: Democratic leaders misleadingly sn...	Updated Oct 24, 2025, 2:08 PM ET\nPublished Oc...	Top Democrats in the Senate and House of Repre...
3	https://www.cnn.com/2025/09/26/politics/cnn-in...	How CNN conducted its deep dive into independe...	Updated Sep 26, 2025, 5:00 AM ET\nPublished Se...	The CNN poll conducted by SSRS was designed to...

	url	story_title	story_date	text
0	https://www.cnn.com/2025/10/27/politics/trump-...	MAGA hero hands big win to Trump’s Latin Ameri...	Updated Oct 27, 2025, 9:20 AM ET\nPublished Oc...	It doesn’t take America’s most lethal aircraft...
1	https://www.cnn.com/2025/10/25/politics/nation...	Why one of Trump’s favorite presidents sent tr...	Updated Oct 25, 2025, 7:00 AM ET\nPublished Oc...	If there’s one former president whom Donald Tr...
2	https://www.cnn.com/2025/10/25/politics/east-w...	The East Wing demolition speaks to Trump’s wre...	Updated Oct 25, 2025, 4:30 AM ET\nPublished Oc...	Everyone says the same thing the first time th...
3	https://www.cnn.com/2025/10/23/politics/trump-...	Trump’s potential $230 million DOJ payment wou...	Updated Oct 23, 2025, 12:34 AM ET\nPublished O...	The idea of a president convincing his own Jus...
4	https://www.cnn.com/2025/10/21/politics/white-...	The new East Wing: Caesars Palace meets the Pa...	Updated Oct 22, 2025, 7:56 AM ET\nPublished Oc...	The ballroom President Donald Trump is demolis...

	url	story_title	story_date	text
0	https://www.cnn.com/2025/10/27/politics/trump-...	MAGA hero hands big win to Trump’s Latin Ameri...	Updated Oct 27, 2025, 9:20 AM ET\nPublished Oc...	It doesn’t take America’s most lethal aircraft...
1	https://www.cnn.com/2025/10/25/politics/nation...	Why one of Trump’s favorite presidents sent tr...	Updated Oct 25, 2025, 7:00 AM ET\nPublished Oc...	If there’s one former president whom Donald Tr...
2	https://www.cnn.com/2025/10/25/politics/east-w...	The East Wing demolition speaks to Trump’s wre...	Updated Oct 25, 2025, 4:30 AM ET\nPublished Oc...	Everyone says the same thing the first time th...
3	https://www.cnn.com/2025/10/23/politics/trump-...	Trump’s potential $230 million DOJ payment wou...	Updated Oct 23, 2025, 12:34 AM ET\nPublished O...	The idea of a president convincing his own Jus...
4	https://www.cnn.com/2025/10/21/politics/white-...	The new East Wing: Caesars Palace meets the Pa...	Updated Oct 22, 2025, 7:56 AM ET\nPublished Oc...	The ballroom President Donald Trump is demolis...

PPOL 5203 Data Science I: Foundations Scraping static websites Tiago Ventura