Published 2020. 7. 31. 23:37

파이썬 크롤링 공부 - requests, beautifulsoup4 라이브러리

728x90

간단하게 url을 통해 HTML 페이지를 요청하고 HTML 페이지를 파싱해서 title 태그를 가져오는 실습을 진행해보았다.
실습한 과정을 간단하게 정리해본다.

1. 파이썬 설치하기

파이썬을 설치하는 방법은 여러가지가 있는데 Homebrew 를 사용해서 설치해보았다.
Homebrew intall for MacOS : brew.sh/#install

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

homebrew 설치 후 파이썬 설치

brew install python

python 설치 후 버전 확인

python --version

2. beautifulsoup4 , requests 라이브러리 설치하기

# requests 라이브러리 설치
pip intall requests

# beautifulsoup4 라이브러리 설치
pip intall beautifulsoup4

3. 라이브러리 import 하기

import requests
from bs4 import BeautifulSoup

4. requests 라이브러리 사용해서 HTML 페이지 가져오기

requests guide : requests.readthedocs.io/en/master/

# requests 라이브러리를 사용해서 HTML 페이지를 요청한다.
# res 객체에 HTML 데이터가 저장되고, res.content 로 데이터 추출
res = requests.get('https://www.naver.com')

5. beautifulsoup4 라이브러리 사용해서 HTML 파싱하기

# HTML 소스를 가져온다.
soup = BeautifulSoup(res.content, 'html.parser')

6. HTML 파싱 후 title 태그 가져오기

# HTML 소스에서 title 태그를 가져온다.
# find 메소드를 통해서 태그를 검색할수 있다.
title = soup.find('title')

7, 타이틀 출력하기

# title 태그의 text 출력
print(title.get_text())

<python code>

import requests
from bs4 import BeautifulSoup

# requests 라이브러리를 사용해서 HTML 페이지를 요청한다.
# res 객체에 HTML 데이터가 저장되고, res.content 로 데이터 추출
res = requests.get('https://www.naver.com')

# HTML 소스를 가져온다.
soup = BeautifulSoup(res.content, 'html.parser')

# HTML 소스에서 title 태그를 가져온다.
# find 메소드를 통해서 태그를 검색할수 있다.
title = soup.find('title')

# title 태그의 text 출력
print(title.get_text())

728x90

저작자표시 비영리 변경금지 (새창열림)

'프로그래밍 언어 > Python' 카테고리의 다른 글

Jupyter notebook - 500 : Internal Server Error (0)	2022.12.25
파이썬 print() 함수 줄바꿈 없이 출력 (0)	2022.12.19
Python 더하기, 빼기, 곱하기, 몫과 나머지 계산 (0)	2022.10.29
파이썬(Python) Jupyter notebook 설치 (MacBook) (0)	2020.12.03
파이썬 크롤링 공부 - selenium 사용해보기 (0)	2020.08.01

파이썬 크롤링 공부 - requests, beautifulsoup4 라이브러리

'프로그래밍 언어 > Python' 카테고리의 다른 글

티스토리툴바