In this video we will look at Python Scrapy and how to create a spider to crawl websites to scrape and structure data.

Download Kite free:

Code & Commands:

💖 Become a Patron: Show support & get perks!

Website & Udemy Course Links:

Follow Traversy Media:

Nguồn: https://phanrangninhthuan.info/

Xem thêm bài viết khác: https://phanrangninhthuan.info/tong-hop/

35 Comments

CS-Learning / Music

July 4, 2020

Is it necessary to have Kite for this code? I have anaconda software.
Looking forward to your reply.

Reply

Que Caldway

July 4, 2020

Ridiculously awesome video! Def an amazing teaching and great start to web scraping with scrapy. Dope Stuff!

Reply

Davy Roger

July 4, 2020

What is the setup on your developer tools on chrome?

Reply

Melisa Liu

July 4, 2020

¿Conoces Octoparse(http://octoparse.es/)? Esta herramienta no necesita escribir python manualmente durante la captura de datos, puede identificar automáticamente la estructura de la página web y extraer rápidamente los datos para sobresalir o para la base de datos. Creo que esta herramienta es muy fácil de usar…

Reply

CS Success Gamer

July 4, 2020

made it to 28:00 but my posts.json has no data.

Reply

CS Success Gamer

July 4, 2020

can u pick up viruses from websites while u scrap?

Reply

Savaş Uyar

July 4, 2020

Thank you.. Great video.

Reply

Mladen

July 4, 2020

Can you please explain why you used yield on lines 13 and 21 for final version of code? Does this mean parse is generator function in this case? How does this work under the hood?

Reply

jack zero_one

July 4, 2020

why don't you go with :
scrapy genspider spider_name www.example.com
?
any reason ?

Reply

slookify

July 4, 2020

how do i select all texts from for example "a class="new-class"?
i dont want the text from other classes

Reply

Imen Khiari

July 4, 2020

Nice tutorial! In case you guys want to use xpath instead of css (I personally prefer xpath… 🙂 ) :
for post in response.xpath('//*[@class="post-header"]'):
yield{

'title': post.xpath('.//h2/a/text()').get(),
'date': post.xpath('.//*[@class="date"]/a/text()').get(),
'author': post.xpath('.//*[@class="author"]/a/text()').get()

}

Reply

dietermitplatten

July 4, 2020

8:41 I don't understand how that works. He declared a starts_url array and then doesn't use it?

Reply

Eric Beard

July 4, 2020

Your videos always great.

A lot of other coding vids built on python talk about simple math for 8 hours and I learn nothing.

Reply

Elvin Bəşirli

July 4, 2020

I tried to install Scrapy it was written some error, I read the documentation for installation they recommend to use Conda, I installed Scrapy using Anaconda Prompt, then I tried to start project (scrapy startproject <project name>) and got "Fatal error in launcher: Unable to create process using '"d:bldscrapy_1587736721630_h_envpython.exe" " error now and cannot solve it. Can you help please.

Reply

Norishfx Dragneel

July 4, 2020

I need some information and inspiration for my marketing ideas and human behaviour. Can i use it on facebook? Will facebook ban me for that reason? Or it's safe?

Reply

L4ZZA

July 4, 2020

lets say you need to distribute this program to some people and they don't know python. How do you package this project up into an executable that can be run by double clicking or via command line passing arguments?

Reply

Doston Toshpulatov

July 4, 2020

Great Video simple explanation Thank you

Reply

Paul Shop

July 4, 2020

I cant seem to make this code work on Python IDLE. upto 22:24 and it gives me the output on scrapy shell but cant make it work in Python IDLE 3.8.2 please help.

Reply

Cemre Acar

July 4, 2020

It could be helpful
http://www.cemreacar.com/blog/scrapy-framework-ile-web-scraping

Reply

A. D.

July 4, 2020

you're the man!!

Reply

Simone Truglia

July 4, 2020

thank you so much. My first time with Scrapy and you've been really clear. Great video. Tranks mate 🙂

Reply

Rahul T 2003

July 4, 2020

Thank you very much for this tutorial! It's nice, short and crisp!

Reply

Flora Posteschild

July 4, 2020

Hi: great video, but maybe out of date re. your website? I had to dig deeper to get the correct info for date and author in the html:

yield {
'title' : post.css('.post-header h2 a::text')[0].get(),
'date' : post.css('.post-header .byline span a::text')[0].get(),
'author' : post.css('.post-header .byline span a::text')[1].get()
}

And I was getting a no referer error, so had to paste this in near the top:

headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36",
"referer": "https://www.google.com/"
}

def start_requests(self):
yield scrapy.Request(self.start_urls[0], headers=self.headers, callback=self.parse)

Reply

Josue Anyosa Galvez

July 4, 2020

4:38 When I type 'import scrapy' I get the message 'unresolved import 'scrapy'Python(unresolved-import)' I am using vscode

Reply

taimoor arif

July 4, 2020

do u have its course ?? or playlist where are other scrapy videos

Reply

PlayerX

July 4, 2020

Let's scrape … the scraping blog! I had a good laugh. Your courses are amazing and every now and then we get a good laugh. Keep up the excellent work.

Reply

Azamat Shaimerden

July 4, 2020

Love your tutorial man. Thank you. With scrapy can we scrape millions of data with sequenced/scheduled interval to not get blacklisted and keep updating out file?

Reply

marvin evins

July 4, 2020

wish you did a whole series on this

Reply

marvin evins

July 4, 2020

great lesson. After doing some webscraping with selenium, this finally made a lot of sense because I was lost a month ago

Reply

Puppet Review

July 4, 2020

Nice tutorial.. have an issue.. Files like posts.json or posts-1.html is not getting created. Any tips ?

Reply

Ayari Imen

July 4, 2020

What if the Website is heavy on JS? and how to manage the robot.txt that explicitly disallows Scrapy? :/

Reply

hayadatube

July 4, 2020

Hey Guys, I search on web, even follow the instructions on scrapy web page for installing, but it doesn't work.

Have anybody a working tutorial for installing scrapy for ubuntu 18.04 or windows 10… Many thanks in advance…

Reply

The Gripmaster

July 4, 2020

Please use pipenv. venv is outdated

Reply

DontMindMe

July 4, 2020

Is kite copilot safe to use?

Reply

我曹无情

July 4, 2020

It ’s a pity that I do n’t understand English

Reply

Leave a Reply