Skip to main content

53. Using Multiprocessing for Parsing

Parsing large amounts of HTML can be CPU-intensive. Use multiprocessing to parallelize parsing.


Example

from multiprocessing import Pool
from bs4 import BeautifulSoup

html_docs = ["<html><title>Test</title></html>"] * 10

def parse(doc):
soup = BeautifulSoup(doc, "html.parser")
return soup.title.string

with Pool(4) as pool:
titles = pool.map(parse, html_docs)
print(titles)

Wrap-Up

Multiprocessing speeds up CPU-heavy tasks like parsing.