[Tutorial] Scrapy and Django

So because I spent so much time reading and scouring the web for assorted information, I’m going to go ahead and post what I learned here for future reference / so that you might not have to do the same. This is a tutorial on how to set up scrapy within your django project such that you can successfully access django models through scrapy. DjangoItem is not used here — it is not well documented and wasn’t letting me set my objects with information, which doesn’t surprise me given its experimental status. Here I set up scrapy in the top level directory of a django project, such that the scrapy.cfg and manage.py are on the same level. I’ve found this way that the normal “scrapy crawl [spider]” wasn’t working for me but it’s cool cuz I wrote a python manage command — it has something to do with paths and the structure. At any rate, here we go:

1) Where do I put my scrapy project directory?
Organize it such that yourscrapy.cfg is in the same folder as manage.py, and yourscrapyproject folder is in the same folder as well. Both django and scrapy’s project creation commands generate a directory to hold all the files they generate, so for scrapy you’re going to want to cp scrapyproject/* ./

2) How do I run my spider?

First you need to add
os.environ['DJANGO_SETTINGS_MODULE'] = 'yourdjangoproject.settings'
to your scrapy project settings.

Then you copy and use this manage command.
https://gist.github.com/2975718

I called this file scraper.py, so I could run
python manage.py scraper crawl [spidername]

3) How do I access my django models from my spider?
import MYPROJECT.model.models as models
myobject = models.MyObject()
myobject.somefield = 3
myobject.save