[Python] Beatiful Soup 를 이용하여 Instagram의 정보를 가져오자 #2

티스토리 뷰

Language/Python

[Python] Beatiful Soup 를 이용하여 Instagram의 정보를 가져오자 #2

jhbaek 2018. 6. 5. 23:34

저번편에 이어서 Instagram 크롤링을 계속 해보자

먼저 할 일은 chrome process를 hidden으로 실행하는 것.

이것에 대한 정보는 거진 https://beomi.github.io/2017/01/20/HowToMakeWebCrawler/ 에서 다 얻어왔다.

아주 단순하다. chromedriver의 attribute에 headless를 추가하면 끝.

거기다가 span 태그를 통해 읽어온 '태그 갯수'에서 text만 추출하는 코드를 추가했다.

from bs4 import BeautifulSoup
import selenium.webdriver as webdriver

url = "https://www.instagram.com/explore/tags/jmt/"
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('disable-gpu')
driver = webdriver.Chrome('chromedriver', chrome_options=options)
driver.get(url)

soup = BeautifulSoup(driver.page_source, "html.parser")
tag = soup.find("span",{"class": "g47SY "})
count = tag.text
print(count)

https://github.com/100lab/poc_crawling_insta/commit/506a054324bc4144b5117dfbff59a53b18d0b3cc

다음번에는 restful server를 하나 만들어서, request가 올 경우 위의 값을 return해주는 api를 하나 만들어보자.

저작자표시 비영리 변경금지

'Language > Python' 카테고리의 다른 글

python에서 한글 url 열기 (2)	2018.07.18
Selenium 사용시 실제 페이지 소스코드랑 다른 내용이 보여지는 경우 (0)	2018.07.18
[Python] Beatiful Soup 를 이용하여 Instagram의 정보를 가져오자 #3 (2)	2018.06.13
[Python] Django로 간단한 웹서버 만들기 (0)	2018.06.13
[Python] Beatiful Soup 를 이용하여 Instagram의 정보를 가져오자 #1 (2)	2018.06.03
파이썬 기본 문법 모음 (0)	2018.01.27
[Google App Engine] Hello GAE (2) (0)	2017.12.24
[Google App Engine] 시작하기 - 환경 설정 (1) (0)	2017.12.24

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2024/04 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

글 보관함

JHB의 삽질 이야기

티스토리 뷰

[Python] Beatiful Soup 를 이용하여 Instagram의 정보를 가져오자 #2

'Language > Python' 카테고리의 다른 글

티스토리툴바