Python yordamida Veb-saytlar uchun Foydalanuvchi Agentlarini Aniqlash
User-Agent (foydalanuvchi agenti) – bu mijoz (foydalanuvchi yoki dastur) tomonidan serverga yuboriladigan so‘rov sarlavhasining bir qismi bo‘lib, u orqali veb-saytlar mijoz qurilmasi haqida ma’lumot olishadi. Bu agentning vazifasi – foydalanuvchi qurilmasining turi, operatsion tizimi va brauzeri haqida ma’lumot berishdir. Serverlar User-Agent orqali qaysi qurilmadan foydalanilayotganini bilib olishlari va kontentni moslashtirishlari mumkin.
Foydalanuvchi Agentining Vazifalari
Kontentni moslashtirish: Brauzer va qurilmaga mos dizayn va tarkib.
Statistik tahlil: Brauzerlar, operatsion tizimlar va qurilmalar statistikasi.
Botlarni aniqlash: Ma’lumotlarni o‘zlashtiruvchi botlarni aniqlash.
Foydalanuvchi Agentlarining Strukturasi
Oddiy User-Agent qatorini ko‘rib chiqamiz:
plaintextCopy codeMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36
Bu qator foydalanuvchi agentining brauzer turi, operatsion tizim va boshqa komponentlar haqidagi ma’lumotlarini beradi. Strukturasi:
Brauzer (yoki dastur) turi: Mozilla/5.0
Operatsion tizim: (Windows NT 10.0; Win64; x64)
Render dvigateli: AppleWebKit/537.36 (KHTML, like Gecko)
Brauzer nomi va versiyasi: Chrome/92.0.4515.107
Qo‘shimcha ma’lumot: Safari/537.36
1 Python’da Foydalanuvchi Agentini Sozlash
Python’da requests kutubxonasidan foydalanib, veb-saytlarga so‘rov yuborishda User-Agent ni qo‘lda sozlash mumkin.
O‘rnatish
requests kutubxonasini quyidagi buyruq bilan o‘rnating:
pipinstallrequests
Oddiy Foydalanuvchi Agentni Sozlash
Quyidagi funksiyada veb-saytga so‘rov yuborishda User-Agent ni qanday sozlash mumkinligi ko‘rsatilgan:
import requestsdefget_page_with_user_agent(url,user_agent):""" Berilgan URL manzilga o'zgartirilgan User-Agent bilan so'rov yuborish. """ headers ={"User-Agent": user_agent # Foydalanuvchi agentini sozlash} response = requests.get(url, headers=headers)# GET so'rovini yuborishreturn response# Sinov uchun User-Agent va URLurl ="https://httpbin.org/headers"user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"
response =get_page_with_user_agent(url, user_agent)# Javobni chiqarishprint(response.json())
Tahlil:
headers = {"User-Agent": user_agent}: User-Agent qiymatini sozlaydigan sarlavha (header) o‘zgaruvchisi yaratiladi.
requests.get(url, headers=headers): Berilgan URL manzilga headers parametri bilan so‘rov yuboriladi, bu so‘rovda User-Agent kiritilgan.
2 Foydalanuvchi Agentlari Ro‘yxati bilan Random Agentni Tanlash
Ba’zan bir nechta User-Agent larni ishlatish foydali bo‘ladi (masalan, saytga skanerlash yoki tahlil qilganda). Quyida ro‘yxatdagi User-Agent lardan tasodifiy tanlab so‘rov yuborish funksiyasi keltirilgan.
import random# Foydalanuvchi agentlari ro'yxatiuser_agents = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15",
"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1"
]defget_random_user_agent():""" Foydalanuvchi agentlaridan tasodifiy User-Agent qaytarish. """return random.choice(user_agents)defget_page_random_user_agent(url):""" Tasodifiy User-Agent bilan URL'ga so'rov yuborish. """ user_agent =get_random_user_agent() headers ={"User-Agent": user_agent} response = requests.get(url, headers=headers)return response# URL manzil va tasodifiy User-Agent bilan sinovurl ="https://httpbin.org/headers"response =get_page_random_user_agent(url)# Javobni chiqarishprint(response.json())
headers = {"User-Agent": user_agent}: User-Agent tanlangan qiymat bilan sozlanadi va keyinchalik so‘rovda ishlatiladi.
3 User-Agent orqali Brauzer Tahlili
User-Agent ni tahlil qilish orqali brauzer va qurilma haqidagi ma’lumotlarni ajratib olish mumkin. Quyidagi misolda User-Agent ni brauzer va operatsion tizimga ajratish funksiyasi keltirilgan.
import redefanalyze_user_agent(user_agent):""" User-Agent ma'lumotidan brauzer va operatsion tizimni ajratib olish. """ browser ="Noma'lum" os ="Noma'lum"# Brauzerni aniqlashif"Chrome"in user_agent: browser ="Chrome"elif"Safari"in user_agent: browser ="Safari"elif"Firefox"in user_agent: browser ="Firefox"# Operatsion tizimni aniqlashif"Windows"in user_agent: os ="Windows"elif"Mac OS X"in user_agent: os ="Mac OS X"elif"iPhone OS"in user_agent: os ="iOS"elif"Android"in user_agent: os ="Android"return browser, os# User-Agentni tahlil qilishuser_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"
browser, os =analyze_user_agent(user_agent)print("Brauzer:", browser)print("Operatsion tizim:", os)
Tahlil:
if "Chrome" in user_agent – User-Agent tarkibida "Chrome" matni mavjudligini tekshiradi va brauzerni aniqlaydi.
if "Windows" in user_agent – User-Agent tarkibida "Windows" mavjudligini tekshiradi va operatsion tizimni aniqlaydi.
5 Veb-sayt Javobidagi Foydalanuvchi Agentni Tekshirish
Veb-saytlar so‘rovlarni qabul qilganda User-Agent ni qayta ishlaydi va bu ma’lumotni tekshirish orqali biz o‘zimiz yuborgan User-Agent ni ko‘rib olishimiz mumkin.
defcheck_user_agent_on_site(url,user_agent):""" URL'ga yuborilgan so'rovda User-Agentni tekshirish. """ headers ={"User-Agent": user_agent} response = requests.get(url, headers=headers) response_json = response.json()# JSON tarkibidagi User-Agent ma'lumotini chiqarishprint("Yuborilgan User-Agent:", user_agent)print("Serverdan qaytgan User-Agent:", response_json["headers"]["User-Agent"])# Sinov uchun URL va User-Agenturl ="https://httpbin.org/headers"user_agent ="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"check_user_agent_on_site(url, user_agent)
Tahlil:
response_json["headers"]["User-Agent"] – JSON javobidan User-Agent qiymatini oladi va yuborilgan agent bilan solishtiradi.
5 Foydalanuvchi Agentlari Orasida Aylanma So‘rov Yuborish
Quyidagi funksiya User-Agent lar orasidan har so‘rovda birini aylanma (tasodifiy emas) tarzda tanlaydi va saytga so‘rov yuboradi.
import itertools# Aylanma tarzda foydalanish uchun user_agents listini itertools.cycle ga o'rnatamizuser_agents_cycle = itertools.cycle(user_agents)defget_page_cyclic_user_agent(url):""" Aylanma (cycle) usulida User-Agent bilan so'rov yuborish. """ user_agent =next(user_agents_cycle) headers ={"User-Agent": user_agent} response = requests.get(url, headers=headers)return response# URL va aylanma User-Agent bilan so'rov yuborishurl ="https://httpbin.org/headers"for _ inrange(5):# 5 ta so'rov yuborish response =get_page_cyclic_user_agent(url)print(response.json())
Tahlil:
user_agents_cycle = itertools.cycle(user_agents) – itertools.cycle yordamida user_agents ro‘yxati bo‘yicha aylanma generator yaratiladi.
user_agent = next(user_agents_cycle) – Har safar yangi so‘rov yuborilganda keyingi User-Agent ni tanlaydi.
6 To‘liq Dastur
Quyidagi dastur barcha funktsiyalarni birlashtirgan to‘liq dastur bo‘lib, veb-saytga User-Agent lar orqali so‘rov yuborish, brauzer va operatsion tizimni aniqlash, va aylanma User-Agent lar bilan ishlash imkonini beradi.
import requestsfrom bs4 import BeautifulSoupimport randomimport reimport itertoolsuser_agents = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15",
"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1"
]user_agents_cycle = itertools.cycle(user_agents)defget_random_user_agent():return random.choice(user_agents)defanalyze_user_agent(user_agent): browser ="Noma'lum" os ="Noma'lum"if"Chrome"in user_agent: browser ="Chrome"elif"Safari"in user_agent: browser ="Safari"elif"Firefox"in user_agent: browser ="Firefox"if"Windows"in user_agent: os ="Windows"elif"Mac OS X"in user_agent: os ="Mac OS X"elif"iPhone OS"in user_agent: os ="iOS"elif"Android"in user_agent: os ="Android"return browser, osdefget_page_random_user_agent(url): user_agent =get_random_user_agent() headers ={"User-Agent": user_agent} response = requests.get(url, headers=headers)return responsedefcheck_user_agent_on_site(url,user_agent): headers ={"User-Agent": user_agent} response = requests.get(url, headers=headers) response_json = response.json()print("Yuborilgan User-Agent:", user_agent)print("Serverdan qaytgan User-Agent:", response_json["headers"]["User-Agent"])# URL va sinov uchun funksiyalarni chaqirishurl ="https://httpbin.org/headers"random_response =get_page_random_user_agent(url)print("Random User-Agent bilan so'rov:", random_response.json())for _ inrange(5): cyclic_response =get_page_random_user_agent(url)print("Cycle User-Agent bilan so'rov:", cyclic_response.json())user_agent = user_agents[0]browser, os =analyze_user_agent(user_agent)print("Brauzer:", browser)print("Operatsion tizim:", os)check_user_agent_on_site(url, user_agent)
Bu dastur yordamida User-Agent larni sozlash va o‘zgartirish, tahlil qilish va aylanma usulda ishlatish bo‘yicha amaliyotlar bilan tanishasiz.