Besides what we have seen in Note 1, we can add some details in our codes: Sometimes we need to pretend as the browser to obtain the content of the page, we can add headers:
import urllib2
url = 'http://drapor.me'
headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
request = urllib2.Request(url, headers)
response = urllib2.urlopen(request)
print response.read()
More about headers, you can check this article.
And if you need to post some data, like your username or password to the sites, try this:
import urllib2
values = {"username":"drapor","password":"*****"}
data = urllib.urlencode(values)
url = "http://www.heibanke.com/lesson/crawler_ex01/" #This is a crawler game I found before which is quite interesting.
request = urllib2.Request(url,data)
response = urllib2.urlopen(request)
print response.read()
If you find there’s something wrong with your code, you can try this:
import urllib2
requset = urllib2.Request('http://www.xxxxx.com')
try:
urllib2.urlopen(request)
except urllib2.URLError, e:
print e.reason
And if you need to use cookies, try this:
import urllib2
import cookielib
cookie = cookielib.CookieJar() #Declar a CookieJar object to save cookie
handler = urllib2.HTTPCookieProcessor(cookie) #use HTTPCookieProcessor object in urllib2 to create cookie processor
opener = urllib2.build_opener(handler) #build opener by Handler
response = opener.open('http://drapor.me') # this opener.open() is the same as urllib2.urlopen()
for item in cookie:
print 'Name = '+item.name
print 'Value = '+item.value