What is Google bot or Crawler?

Crawling (Robot ya Spider ki tarha) ek general term hai, Jo different websites (ek page k through all other page) ko automatically find or scan karny k liye use ki jati hai. Google k main crawler ko Googlebot kehty han.
Here are some information about Google Crawlers.

Googlebot:
Googlebot (also Called a Spider) Google ka web search Crawler hai Jo all website k new or updated pages ko discover kar k Google index me add karta hai. Basically You know k world wide web par billions of webpages available han Jin k liye kuch computer program hi hoty han jo Algorithmic method use karty howe Google-bot ko btaty han k kisi site ko kab, kesy or us k kitny pages Google index me add karny han.
Google bot ka process kisi bhi website k sitemap (Jo kisi bhi site ka webmaster provide karta hai) ki list check karny k bad start hota hai, Wo esy sub pages ko add kar leta hai jo sitemap me pehly add nahi hoty, Isi liye Google k search engine me kisi bhi site ko add karny ka sub se best way ye hai k webmaster sitemap create kar k google Index me submit kar dy.

How to block Googlebot to access your site:
kisi bhi content ko publish hony k bad crawler se secret rakhna possible nahi, agar webmaster apny content ko remove bhi kar dy to crawler us ka link search me display karta rahy ga, So agar aap chahty han k Google-bot aap ki site ko crawl na kary to aap k pas bohut si options me se robots.txt sub se achi option hai, Jis ko use kar k aap apny server par exist all files or folders ko block kar sakty han.
Jesy hi webmaster robots.txt file create karta hai to kuch hi der k bad Googlebot us ki changes ko samaj jata hai but agar wo phir bhi content ko crawl kary to robots.txt ki location check karni chahye jo always server ki top (Root) directory me honi chahye.
Web server k log me error messages “file not exist” ko rokny k liye robots.txt k name se ek empty file create karen.
Googlebot ko apni site k kisi bhi page ya link ko check karny se rokna chahty han to “Nofollow meta tag” use karen.
Googlebot ko kisi bhi individual link ko search karny se rokny k liye link ki jaga “rel=nofollow” ka attribute add kar den.
How to Make Sure “WebSite is Crawlable”
Googlebot Mostly websites ko ek page par exist link se second page par jump karty howe discover karta hai, So ek page se other page par crawling karty howe Googlebot ko agar koi problem ho to webmaster account k crawl errors me usy show kar deta hai jis ko regularly check karty rehny se webmaster usy check karny k bad problem ko solve kar leta hai.
Agar aap apni website k content ko Ajax Application use karty howe search result me appear karwana chahty han to Ajax Based all Contents Crawl-able or index-able hi create karen.
Agar aap ki robots.txt ki file achi tarha work kar rahi hai, but aap ki site par zyada traffic nai aa rahi to us k liye aap ko content se related some other reasons check karni chahyen.
Problem for spammers or user agents:
Googlebot k ip address after some time change hoty rehty han, So Googlebot ki aap k server par access ko identify or verify karny ka sub se best way ye hai k aap “Reverse DNS Lookup” use karen.
Googlebot or other search engine k bots robots.txt me exist webmaster ki instructions ki respect karty howe un ko follow karty han but spammers esa nahi karty, Agar aap k server par koi spam activity ho to foran Google ko report karen.
Google k pas bohut se user agent han jin me feed fetcher bhi kafi popular hai, Jo sirf Google home page ya Google Reader par add hony wali requests par action leta hai kyun k ye automated crawler nahi hai, Isi liye ye robots.txt ki instructions ko bhi understand nahi kar sakta. Agar aap FeedFetcher ko apny server par crawling se rokna chahty han to apny server ko 404 ya 410 error message user agent ko send karny k liye configure kar den.
Googlebot new publish hony wali news, images, audio or video collections k sath same action karta hai or is tarha kisi bhi data ko Googlebot ki crawling se secret rakhny k liye ek hi tarha ka method use hota hai jo uper mention kiya gya hai, Jab k Adsense or Adword k liye use hony wala Googlebot one weak k bad apna data update karta hai
Adsense ka Googlebot sirf requested urls ko hi attempt karta hai.
Crawler esy pages ko bhi attempt karta hai jo webpage se redirect ho kar open hota hai, Esi situation me Google webmaster ki site ko apni marzi se crawl kar k apni search me add nai kar sakta kyun k crawling bot (Computer programs) k through automatically hoti hai, So agar webmaster apny kisi page me changes kary to 1 se 2 weaks k bad index me us ka effect ho ga.
How to Act Robots.txt?
Agar webmaster chahta ho k Google us k all pages crawl kar saky to usy “Robots.txt” ya is jesi kisi or file ki koi zaroorat nahi, but agar koi apny all ya kuch pages block karna chahy ta k koi us k secret content ko access na kar saky to us k liye Googlebot jesy kisi bhi user-agent ko robots.txt file submit karna parhti hai. For example; agar aap apny sub pages or content ko Google search me add kar k us par Adsense k ads bhi display karwana chahty han to aap ko robots.txt file ki zaroorat nahi, isi tarha agar aap Google se apny contents ko block karny k liye request submit karen to Googlebot k sath sath Google k other user-agent bhi us ko block kar den ge.
Agar aap mazeed specific result lena chahen yani aap apny pages to Google search me appear karna chahty han but personal hosting me exist images ko add (crawl) karwana nahi chahty to esi condition me robots.txt me hi user-agent ko restrict karen k wo aap k personal images ko Search me include na kary; like see…
user-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow: /personal

[AdSense-A]

Isi method ko use karty howe kisi bhi specified crawling ko allow or disallow kiya ja sakta hai.
Robots Meta Tag:
kuch pages multiple robots k liye different meta tag use karty han ta k different crawlers specified instructions ko easily understand kar saken, like;
<meta name=”robots” content=”nofollow”><meta name=”googlebot” content=”noindex”>

Esi situation me, Negative directives ki sum use karty howe Googlebot ‘noindex or nofollow’ donon directives ko follow kar le ga. Agar aap Just ek blogger hain or apny content se revenue earn karna chahty han to kabhi bhi Robots ko block mat karen.
For more Information Visit: Google Crawling Control Process


Admin

Tayyib Ahsan is an Entrepreneur and Freelance Technology Writer, His Passion is to Help Others in Blogging, Marketing and Online Shopping to Gain Knowladge & Success. In addition, He also offers E-Currency Exchange Services for Individuals and Companies Worldwide. Get in touch with him on Twitter or Facebook.

Check Also

How to Earn Money from Google Adsense in Pakistan

Google Adsense is a free, easy and simple way for publishers to display quality ads, …