Strange URLs indexed: www.***.com/ar/be/it translation mess

Popular automatic website translation tool

Strange URLs indexed: www.***.com/ar/be/it translation mess

Postby HappyEscorts » Mon Sep 05, 2011 6:40 pm

First of all, thank you very much for your great work, Edward, especially for the new caching system!

Question:
Google is indexing strange URLs adding several country codes to the URLs such as:

http://www.HappyEscorts.com/it/dk/
http://www.HappyEscorts.com/ar/be/it
http://www.HappyEscorts.com/sr/ja
http://www.HappyEscorts.com/ru/ko/id/zh-CN

Clicking any such URLs results in a terrifying translation mess (the pages are mixed with Arabic, Russian, and Japanese content).

These are correct URLs (and everything works perfect):

www.HappyEscorts.com
www.HappyEscorts.com/de
www.HappyEscorts.com/fr
www.HappyEscorts.com/it

Additionally, this issue also results in tens of thousands of 404 errors (according to google webmaster tools).
We do not have any such URLs in our sitemaps that are submitted via webmaster tools.

Any advice is appreciated, thanks.

System:
gtranslate PRO v. 1.5.x.26
Artio SEF v. 3.7.4
joomfish v. 2.1.7
at joomla 1.5.20
php 5.3.3

Regards,
Peter
HappyEscorts
 
Posts: 5
Joined: Sun Jan 02, 2011 4:21 pm

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby alt_f4 » Tue Sep 06, 2011 2:09 pm

Hands-on-solution:

create a robots.txt in your (Joomla) Root, eg

Code: Select all
User-agent: *
Disallow: /it/dk/
Disallow: /ar/be/it/
Disallow: /sr/ja/
Disallow: /ru/ko/id/zh-CN/


Furthermore you can exclude the error-causing pages in Google Webmaster Tool http://www.google.com/webmasters you already know:

To request removal of the outdated cached version of the page from search results:

Verify your ownership of the site in Webmaster Tools.
On the Webmaster Tools home page, click the site you want.
On the Dashboard, click Site configuration in the left-hand navigation.
Click Crawler access, and then click Remove URL.
Click New removal request.
Type the URL of the page you want removed, and then click Continue. Note that the URL is case-sensitive—you will need to submit the URL using exactly the same characters and the same capitalization that the site uses. How to find the right URL.
Select The page has changed and Google's cached version is out of date..
Click Submit Request.


Regards, alt_f4

P.S. You should upgrade to Joomla 1.5.23 cause of security issues in J.1.5.20 http://www.joomla.org/download.html
alt_f4
 
Posts: 68
Joined: Tue May 31, 2011 2:43 pm

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby Edvard » Tue Sep 06, 2011 4:04 pm

Hi,

Probably you are using the old version. You will need to upgrade to the latest version or use this rewrite rules instead in your .htaccess file.

Code: Select all
# gtranslate config
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/([a-z]{2}|zh-CN|zh-TW)/(.*)$ /$1/$3 [R=301,L]
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/([a-z]{2}|zh-CN|zh-TW)$ /$1/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/(.*)$ /gtranslate/translate.php?lang=$1&url=$2 [L,QSA]
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)$ /gtranslate/translate.php?lang=$1 [L,QSA]
Regards,

Edvard Ananyan - GTranslate Team

Please leave your feedback on your CMS plugin directory. It is very important for us!
Google Translate Joomla
Google Translate WordPress
Google Translate Drupal
Edvard
Site Admin
 
Posts: 4183
Joined: Mon Jun 28, 2010 1:54 pm
Location: Yerevan, Armenia

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby HappyEscorts » Tue Sep 06, 2011 10:50 pm

Thank you Edward,
I will try the new rewriting rules.

And thanks to you, alt_f4!
However, at HappyEscorts we have some 53,000 real pages (not gtranslated ones) and some 43,000 pages are indexed by google plus roughly 10,000 pages are indexed by BING). How can I possibly find out, which of these indexed pages have such strange URLs?

And even if I'd had an URL-list, should I add them all to the robots.txt (probably a few thousand disallowed?) - yet it would be almost impossible to remove all these wrong URLs by hand via google webmaster tools or via BING webmaster.

Any advice?
Does anybody else have similar male formatted URLs in google's or BING's index?

Regards,
Peter
HappyEscorts
 
Posts: 5
Joined: Sun Jan 02, 2011 4:21 pm

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby Edvard » Wed Sep 07, 2011 2:50 pm

If you just change the rewrite rules it will be fixed over time and the strange URLs will disappear from the index.
Regards,

Edvard Ananyan - GTranslate Team

Please leave your feedback on your CMS plugin directory. It is very important for us!
Google Translate Joomla
Google Translate WordPress
Google Translate Drupal
Edvard
Site Admin
 
Posts: 4183
Joined: Mon Jun 28, 2010 1:54 pm
Location: Yerevan, Armenia

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby HappyEscorts » Wed Sep 07, 2011 3:54 pm

edo888 wrote:If you just change the rewrite rules it will be fixed over time and the strange URLs will disappear from the index.


I will do so, Edward.
However, can you please add a line to the rewriting rules: We use joomfish for DE - hence, the language DE should not be gtranslated.

Thank you in advance
Regards
Peter
HappyEscorts
 
Posts: 5
Joined: Sun Jan 02, 2011 4:21 pm

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby Edvard » Fri Sep 09, 2011 12:43 am

Code: Select all
# gtranslate config
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/([a-z]{2}|zh-CN|zh-TW)/(.*)$ /$1/$3 [R=301,L]
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/([a-z]{2}|zh-CN|zh-TW)$ /$1/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !^/de
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/(.*)$ /gtranslate/translate.php?lang=$1&url=$2 [L,QSA]
RewriteCond %{REQUEST_URI} !^/de
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)$ /gtranslate/translate.php?lang=$1 [L,QSA]
Regards,

Edvard Ananyan - GTranslate Team

Please leave your feedback on your CMS plugin directory. It is very important for us!
Google Translate Joomla
Google Translate WordPress
Google Translate Drupal
Edvard
Site Admin
 
Posts: 4183
Joined: Mon Jun 28, 2010 1:54 pm
Location: Yerevan, Armenia

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby HappyEscorts » Mon Sep 12, 2011 12:03 pm

Thank you Edward!
We've found a different solution avoiding double content via trailing slash:
www.HappyEscorts.com/de vs. www.HappyEscorts.com/de/

What do you think?

Code: Select all
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)$ /$1/ [R=301,L]
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/([a-z]{2}|zh-CN|zh-TW)/(.*)$ /$1/$3 [R=301,L]
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/([a-z]{2}|zh-CN|zh-TW)$ /$1/ [R=301,L]
RewriteCond %{REQUEST_URI} !^/de
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)/(.*)$ /gtranslate/translate.php?lang=$1&url=$2 [L,QSA]
RewriteRule ^([a-z]{2}|zh-CN|zh-TW)$ /gtranslate/translate.php?lang=$1 [L,QSA]
HappyEscorts
 
Posts: 5
Joined: Sun Jan 02, 2011 4:21 pm

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby Edvard » Mon Sep 12, 2011 6:52 pm

I don't understand what you did and how that should stop multiple language codes in the URL.
Regards,

Edvard Ananyan - GTranslate Team

Please leave your feedback on your CMS plugin directory. It is very important for us!
Google Translate Joomla
Google Translate WordPress
Google Translate Drupal
Edvard
Site Admin
 
Posts: 4183
Joined: Mon Jun 28, 2010 1:54 pm
Location: Yerevan, Armenia

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby nerdcode » Sun Oct 20, 2013 4:37 pm

I think that whole URL problem should be solved the other way round..
we once had problems using the fix for our Escorts in Germany URL.
nerdcode
 
Posts: 2
Joined: Sun Oct 20, 2013 4:33 pm

Re: Strange URLs indexed: www.***.com/ar/be/it translation

Postby Yana » Sun Oct 20, 2013 11:06 pm

Hi,
You can try to add the rewrite rule in your .htaccess file to add trailing slash to the end of the URLs.
Regards,

Yana Ghahramanyan - GTranslate Team

Please leave your feedback on your CMS plugin directory. It is very important for us!
Google Translate Joomla
Google Translate WordPress
Google Translate Drupal
Yana
 
Posts: 4134
Joined: Thu Jan 12, 2012 6:21 pm


  • Related Topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 0 guests

2GLux