In a few projects in the past I needed to do web scraping to get some data from websites that did not offer access via an API. I was using
C# at the time and scraping web with Html Agility Pack was quite easy.
I now spend most of my time in macOS because of work projects so when I needed to do some web scraping again I did not want to install and set up
Mono to do it again in
C#. I decided to go with
Swift, as I am now quite comfortable with the language after 4 years of using it daily.
The first thing I need to do was to found some library to parse HTML, some
Swift equivalent to
Html Agility Pack. I found SwiftSoup.
SwiftSoup allows you to access
HTML documents and also
HTML fragments. The usage is quite simple, you just need to know a thing or two about
Let’s say you want to parse the Hacker News main page and scrap posts containing some specific keywords.
This is quite an artificial example but the idea is simple. You use the developer tools in your browser of choice to see the
HTML of the parts of a website that you are interested in and try to get to them descending and filtering the
You first need to read the website and parse it
let content = try String(contentsOf: URL(string: "https://news.ycombinator.com/")!) let doc: Document = try SwiftSoup.parse(content)
Looking at the
HTML you can see it uses a table layout and all the posts are in a rows of a table with a class called