Coding with Jesse

Parse Accept-Language to detect a user's language

May 4th, 2008

I'm an English-speaking Canadian living in Germany. Quite often I go to a website like Google or Kayak and find myself looking at a German version of the site.

Okay, I do live in Germany, but why assume that everyone within Germany speaks German? What about visitors from other countries, or even people living here that would prefer to use another language?

What must be happening is these sites are taking my IP address, looking up the geographical location of that address, and choosing the official language for that country. This may work most of the time, but there is an even easier way to choose a language.

Most browsers send an Accept-Language header. For example, mine is set to:

en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2

What this basically says is that I prefer (in decreasing order of preference) Canadian English, generic English, US English, German spoken in Germany, and lastly generic German. Any web site I visit is capable of looking at this list and deciding what language I would prefer.

Of course, no matter what assumptions you make about a visitor, give them a chance to change their language if needed. For example, if you use an Internet cafe in Berlin, you shouldn't be stuck viewing websites in German!

One really nice thing: I often see Google Ads and other geographically targeted ads in German, and this makes ignoring the ads much easier! :)

Update: I was inspired to throw together a quick Accept-Language parser in PHP:

$langs = array();

if (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
    // break up string into pieces (languages and q factors)
    preg_match_all('/([a-z]{1,8}(-[a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', $_SERVER['HTTP_ACCEPT_LANGUAGE'], $lang_parse);

    if (count($lang_parse[1])) {
        // create a list like "en" => 0.8
        $langs = array_combine($lang_parse[1], $lang_parse[4]);
    	
        // set default to 1 for any without q factor
        foreach ($langs as $lang => $val) {
            if ($val === '') $langs[$lang] = 1;
        }

        // sort list based on value	
        arsort($langs, SORT_NUMERIC);
    }
}

// look through sorted list and use first one that matches our languages
foreach ($langs as $lang => $val) {
	if (strpos($lang, 'de') === 0) {
		// show German site
	} else if (strpos($lang, 'en') === 0) {
		// show English site
	} 
}

// show default site or prompt for language

This would produce the following structure for my Accept-Language string:

Array
(
    [en-ca] => 1
    [en] => 0.8
    [en-us] => 0.6
    [de-de] => 0.4
    [de] => 0.2
)

Comments

1 . Geert on May 4th, 2008

Geert

Good advice, indeed. Way simpler than looking for the geo location of an IP address.

I am only wondering about the reason why they once picked that content negotiation format for HTTP headers like Accept. Refering to your example, how would one parse the header easily to know that en-ca has a quality factor of 0.8? Exploding it on “;” or “,” does not really help.

2 . Geert on May 4th, 2008

Geert

Oh, wait! Now I see, en-ca does not have a quality factor of 0.8 but of 1 (by default) since a “q=” parameter has been omitted.

For some reason I misunderstood this content negotiation syntax for a while. But reading and re-reading the specs cleared things up: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

So, sorry for the confusion. Exploding on “,” is the way to go.

3 . Jesse Skinner on May 4th, 2008

Jesse Skinner

@Geert - You inspired me to throw together a parse script in PHP that deals with the q factor. Feel free to use and rewrite this as much as you like. (See above.)

4 . Susie on May 4th, 2008

Susie

Wow, that's exactly the way I feel too! I'm a native English speaker living in China and it drives me nuts when I go to some major websites and am automatically given the Chinese version. I can read a little Chinese, but sometimes its even hard to find the link to go to the English-language version. Its enough to make me want to stop using the website!

5 . Matt on May 6th, 2008

Matt

I started using a geotargetting feature in our ad server that will pick up the users language preference and target ads based on that - instead of Geo location. I saw that and thought pretty much like you did. If you speak spanish and live in texas - then spanish ads make more sense than if you speak english but live in spain.

6 . Kevin on May 25th, 2008

Kevin

Thanks for that script, its simple, effective and it works :)

I didn't want to write the code myself, so I asked Google and found my way here, so thanks again!

7 . Mei on June 20th, 2008

Mei

Hey, great script. I speak <a href="http://en.wikipedia.org/wiki/Welsh_language">Welsh</a> and English, and so my browser is wired for Welsh content, but it's rare that a site would recognise it...

Anyway, Ihope to integrate this excellent piece of code into my future developments.

Diolch!

8 . Denis on June 27th, 2008

Denis

Thank you for nice code for Accept-Language parsing, it saved a lot of time for me!

9 . Nikita on July 5th, 2008

Nikita

I've found many implementations of this parser but wasn't sure, if it's right to explode on ','. So I searched for RFC and found this article. This code is much simpler than I've seen before. Thanks for that and for RFC link :)

10 . Sarah Lewis on July 16th, 2008

Sarah Lewis

Thanks for sharing this code! It's been very helpful for an application I'm coding.

11 . Wyrm on July 29th, 2008

Wyrm

Thanks!

12 . Brian Cherne on August 7th, 2008

Brian Cherne

The W3C has a nice FAQ on when to use the Accept-Language header. http://www.w3.org/International/questions/qa-accept-lang-locales

If all you had to know was language, I see this as being a great resource so long as you provide the user with an easy way to change languages - using an iconic or native language approach.

However, some of the web sites I've worked on have locale-specific pricing/availability... and IP address is simply the most reliable/unobtrusive way to enable this. While it's bad to assume what language a user speaks, it may be impractical to provide translations for all locale-specific options (even for a handful of languages).

13 . Jesse Skinner on August 7th, 2008

Jesse Skinner

@Brian - True, but you could also separate the localization (currency) stuff from the internationalization (language) stuff. So perhaps the pricing, currency and shipping could be based on the IP, whereas the language could be based on the Accept-Language header or whatnot.

You may have to - how else would you want to handle separate pricing for USA versus Canada, for example? Would you have separate language files for each?

14 . Brian Cherne on August 7th, 2008

Brian Cherne

@Jesse - I totally agree with you. In a situation where it's possible localization can (and should) be separated from internationalization.

My point was, however, that assuming products A, B, C are available in the US and products D, E, F are available in Germany, it may not be possible (time/budget-wise) to write English descriptions for products D, E, F or German descriptions for A, B, C. This would be especially true if the return on investment was negligible (i.e., non-German speakers in Germany being <1%).

All depends on how our clients do business. From what I've seen the major retailers have product availability based on region (North America, Europe, Asia/Pacific, ...). Then within each region there are (should be) two independent data sets: supported languages and supported locales (for currency/pricing, etc.). Unfortunately, I think too often language and locale are bundled together in an effort to keep things simple.

15 . Jesse Skinner on August 10th, 2008

Jesse Skinner

@Brian - yep that's totally true. Thanks for pointing that out.

16 . Naveed on September 12nd, 2008

Naveed

How can I get rid of this automatic language detection settings. Everytime I go to google.com it redirects to google.ae (based on my location)... Is there any preference to change.

Please advise in plain English and not in Programming terms.

regards,

17 . Jesse Skinner on September 13rd, 2008

Jesse Skinner

@Naveed - If you look at the bottom of google.ae you will see a link to "Google.com in English" - click that and you'll be transported to the English google.com with no redirect :)

For anyone else's convenience, the link for all languages is:

http://www.google.com/ncr

18 . Lummo on September 26th, 2008

Lummo

Your code is really useful but it failed on some of the language settings I use. I traced the problem to the spec for Accept-Language:

Accept-Language = "Accept-Language" ":"
1#( language-range [ ";" "q" "=" qvalue ] )
language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )

The language-range parameter can be from 1 to 8 alpha characters for both the primary-tag and the subtag.

This can be accommodated by changing your regexp pattern to:

preg_match_all('/([a-z]{1,8}(-[a-z]{1,8})?)s*(;s*qs*=s*(1|0.
[0-9]+))?/i', $_SERVER['HTTP_ACCEPT_LANGUAGE'], $lang_parse);

regards,

Lummo

19 . Jesse Skinner on September 29th, 2008

Jesse Skinner

@Lummo - Thanks for that! I've made the adjustment to allow 1-8 characters in the primary/sub-tags.

20 . Lummo on October 7th, 2008

Lummo

You are welcome. Don't forget the "/i" (for case insensitive) that I slipped in there too. Some of those strings are upper case too.

21 . Jesse Skinner on October 11st, 2008

Jesse Skinner

Thanks again, Lummo!

22 . TarquinWJ on November 18th, 2008

TarquinWJ

What happens if a user has da,en;q=0 which I think is valid, meaning "I want Danish, but whatever you do, don't give me English!"?

A script that steps through the array (like yours) would see "en" and think it was OK to use it as a last resort, but q=0 means "give me anything except this" - so even Swahili would be better.

The nicest approach I can think of for that is to build two arrays, one of the positive, and one of the negative.

23 . Lummo on November 20th, 2008

Lummo

I think that the answer to that is "it depends what you want to do with the information". At the moment the code returns the q values so it's easy to skip over or delete any Accept-Languages where it is zero if that's what you want to do.

If the user has q=0'ed all of the languages that you support then it might be best to offer up an apology before dropping back to some lingua franca. "sorry. We don't speak the same linguine!". The problem is what language to offer it in?

regards,

Lummo

24 . Malte Anglais on February 11st, 2009

Malte Anglais

Great article. Badly designed langauge redirects drive me insane. Especially when there's no obvious link in English to get back.

25 . Ries on February 15th, 2009

Ries

I think not even search engines work to spec in these cases, but yeaa.. I life in Ecuador and often get a spanish version while my browser sends en as preferred, not even google does this correctly.

For a search engine point of view is that google also doesn't understand this and cannot index a website in 3 languages while the content is the same (but show in a different language).

Some people break the spec here, including google.

Ries

26 . Cristian on March 18th, 2009

Cristian

hi, is intersting but can you help me ...I'm not a programer and I want that when people come to my site to be automaticly reditected to his language (if he is from de to german and so on)... so I have no ideea how to do that...

I use a cms and the lang link looks like this:

www.mysite.com/index.php?en_home -for english

www.mysite.com/index.php?de_home -for german

I think you got the ideea ...where do I find a script and instructions that redirects automaticly to his lang?

Best Regards

Cristian

27 . Lummo on March 18th, 2009

Lummo

So, at the moment your index.php is expecting to receive a parameter that specifies the language and you want to redirect to a page in that language? Is that right?

So index.php?en_home would end up at mysite.com/en_home.php?

There are a couple of ways of achieving this but both involve a degree of programming.

1) If your web server is Apache then you can use .htaccess redirect rules to do the redirection for you. You'll need to set up the .htaccess to match the parameter and redirect to the URL that you want to handle that language.

2) You can have your index.php file gather the parameter and then do a redirect to the URL that you want to handle that language. The PHP redirect is done by calling the header() function something like this:

header('Location: ' . $redirectTo, true); // Redirect to target

where $redirectTo contains the target URL.

Can I suggest that, if you can, you alter your URL parameter to be like this:

http://www.mysite.com/index.php?lang=en

That way the language is passed as a parameter value rather than the parameter being the value.

Hope that helps.

with best regards,

Lummo

28 . Woody Gilk on May 6th, 2009

Woody Gilk

Thanks for your article! Based on your analysis and code provided, Koahana v3.0 will have a Request::accept_lang($lang) method. :)

29 . Nicholas Shanks on July 11st, 2009

Nicholas Shanks

I don't know where Lummo got those BNFs for the Accept-Language header, but they do not conform to BCP 47.

See page 4 of RFC 4646: http://tools.ietf.org/html/rfc4646

Something like (I do not write BNF for a living):

2*4ALPHA (["-" 4ALPHA] ("-" 2ALPHA (["-x] "-" 3*8ALPHA)))

allowing tags like "sco-Latn-GB-x-lallans" or "en-oed":
language-Script-COUNTRY-x-dialect
the x prefix is for unofficial dialects (ebonics) and not needed for official ones (OED, Scouse)

In short, the regex needs to be amended something like this:

[a-zA-Z]{1,4}(-[A-Z][a-z]{1,3})?(-[a-zA-Z]{2})?(-x)?(-[a-zA-Z]{3,8})?

and drop the /i from the end.

4 character language codes are reserved and not currently used.
I am ignoring grandfathered tags (i-) and such, read RFC4646 and RFC4647 for the full details.

30 . Lummo on July 12nd, 2009

Lummo

Hello,

You're right. I should have quoted the source. I believe that it was from RFC 2616 Hypertext Transfer Protocol -- HTTP/1.1, Section 14.4: (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4).

{quote}
The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. Language tags are defined in section 3.10.

Accept-Language = "Accept-Language" ":"
1#( language-range [ ";" "q" "=" qvalue ] )
language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )
{/quote}

Section 3.10 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.10) defines the language tags.

{quote}
White space is not allowed within the tag and all tags are case- insensitive.
{/quote}

Looks like I was behind the times :-)

Cheers,

Lummo

31 . Lummo on July 12nd, 2009

Lummo

Btw, RFC 4646 (page 5) says:

{quote}
The tags and their subtags, including private use and extensions, are to be treated as case insensitive: there exist conventions for the capitalization of some of the subtags, but these MUST NOT be taken to carry meaning.
{/quote}

so maybe the ammendment could be:

[a-z]{1,4}(-[a-z]{1,4})?(-[a-z]{2})?(-x)?(-[a-z]{3,8})?/i

Cheers again,

Lummo

32 . Nicholas Shanks on July 13rd, 2009

Nicholas Shanks

The problem with that is that the different groups may mistakenly match the wrong part, e.g.

zh-hant could be matched as $1 = zh, $2 = hant; or as $1 = zh, $5 = hant

The last part of the pattern, the "dialect" as I call it, is problematic. No new dialects are allowed to be defined that are fewer than 5 chars long, so it could be {5,8} except that there are extant cases such as en-OED (oxford english dictionary spellings) where as few as 3 letters are used.

we could either do [A-Z][a-z]{3} .../i for the script code, and allow three-char dialects, or [a-z]{4} for four char script codes, and [a-z]{5,8} for the dialect with a /i at the end.
The script code is only ever 4 chars, so it does not need to be {1,4}

33 . lilly on September 1st, 2009

lilly

Excuse me for been too "blond"
but can I implement this into a plain html website and how possibly could I do it?
I have a website in 9 different language versions and would like an user to be redirected to the appropriate language version of the file he has visited, depending on his manual browser language pick up settings
Thank you in advance

34 . Lummo on September 1st, 2009

Lummo

It depends (of course!) on what facilities you have on the system that hosts the site(s). Apache? PHP? Ruby? Python? Tomcat? etc?

The Apache web server has a scheme for having URLs vectored to a language specific pages depending on the user's Accept-Language HTTP request header. You can read more at http://httpd.apache.org/docs/1.3/content-negotiation.html. This doesn't work too well for SEO for links though. It seems that all the href contents have to be the same for each language.

I am working on an easier and more flexible solution but this margin is too narrow to contain the details. I'd offer to talk with you offline but I don't know how to get in touch without posting and e-mail address or something.

Any suggestions?

Lummo

35 . tayfun on September 18th, 2009

tayfun

Hi, I've added a "(,|$)" to your regular expression because I was getting some junk in the header for some reason. I mean something like this: "tr-TR,tr;q=0.7,chrome://global/locale/intl.properties;q=0.3" A comma or an end of string does help a little in not recognizing junk. Thanks for your post, it helped me understand the discussion.

36 . Taber on November 27th, 2009

Taber

Great article! I'm wondering how reliable/well-adopted across different browsers the Accept-Language header is. For example, IE5+, Firefox 1.x+, etc? I'm sure all modern browsers support it, but just wondering where the line is drawn, if any. Thanks.

37 . Nicholas Shanks on November 27th, 2009

Nicholas Shanks

@taber: Mosaic gained support for Accept-Language in version 2.4 (1992). Netscape/Firefox and IE both inherited this code and so support it since version 1.0 (1994-5). Opera gained support somewhere between versions 3.51 and 5.12 (ca. 2000). Konq/Safari/Chrome have supported it since their respective version 1.0s too (2000, 2003, 2008). Lynx has supported it for longer than I can find. I don't know about iCab, links or w3m. wget and telnet also support it if you remember to write the header yourself ;-)

38 . Taber on November 27th, 2009

Taber

Awesome, thanks Nicholas!

39 . cloved on January 28th, 2010

cloved

how can i do if i use javascript?

40 . Cyrill on February 6th, 2010

Cyrill

Thanks for the script) It fits perfectly into the bootstrap class of ZendFramework)

41 . Manfred Kooistra on February 10th, 2010

Manfred Kooistra

What I don't understand is why you need to sort the languages by q value. It seems to me that they are already sorted by q value as they come from the browser.

42 . Manfred Kooistra on February 10th, 2010

Manfred Kooistra

Aha, I just read the Accept headers part in RFC 2616 at http://www.ietf.org/rfc/rfc2616.txt?number=2616 (section 14.1 starting on page 100), and the examples given are NOT ordered regarding q value. So it seems that, yes, you do need to sort languages.

In the $_Server Manual on php.net an example of a regular expression to parse the accept language header is given here: http://www.php.net/manual/en/reserved.variables.server.php#94237

43 . Manfred Kooistra on February 10th, 2010

Manfred Kooistra

Okay, one last thing. If I understand your code correctly, you test if either English or German are the MOST preferred language, not actually which one of both is MORE preferred.

Let's say you are in the US and offer a site in two languages, English and German, with the default language being English. Now your site is visited by a user speaks no English, only French, German and Italian and who has set his browser to prefer French over German, thus: fr,de;q=0.8,it;q=0.2. With your code this person will recieve the default language version of the website, English, because German is not in the first position of the list (de === 0). This is bad, because he does not know English but would have been happy with the German version.

Instead of testing for either German or English being the FIRST language in the user's preferences, you should test which language comes before the other, no matter where in the list they appear.

The code for that could look something like this:

$sorted_languages = "";
foreach ($langs as $lang => $val)
$sorted_languages .= $lang . "-";

if ((strpos($sorted_languages, 'de') === FALSE) && (strpos($sorted_languages, 'en') === TRUE) {
// show English site
} elseif ((strpos($sorted_languages, 'de') === TRUE) && (strpos($sorted_languages, 'en') === FALSE) {
// show German site
} elseif ((strpos($sorted_languages, 'de') === TRUE) && (strpos($sorted_languages, 'en') === TRUE) {
if (strpos($sorted_languages, 'de') < strpos($lang, 'en') {
// show German site
} else {
// show English site
}
} else { // if both return FALSE
// show default site
}

44 . Jesse Skinner on February 10th, 2010

Jesse Skinner

@Manfred - my code does see which supported language has a higher q value, by first sorting the languages by q value (with arsort) and then looping over them, checking for the languages the site supports. The first matching one must have a higher, or equal, q value to any of the others.

Of course there are other techniques for working with the data; it all depends on what experience you want for your visitors.

45 . Manfred Kooistra on February 10th, 2010

Manfred Kooistra

Jesse, I misunderstood your loop. I was thinking each "strpos" was reading the whole array. I didn't differentiate between "$langs" and "$lang", because my attention was focused on understanding the "if strpos" which I have encountered for the first time here (being only a PHP amateur).

But there still appears to be a problem with your loop: it does not stop, when you find the preferred language but continues for all key-value pairs in your array. If you have both English and German in your preferences, or multiple instances of one language (en-ca, en, en-us), each of them results in a display of your website, one above the other. I'm surprised that no-one has found this in their resulting source code. Shouldn't you put an "exit()" in there? Like this:

foreach ($langs as $lang => $val) {
if (strpos($lang, 'de') === 0) {
// show German site
exit();
} else if (strpos($lang, 'en') === 0) {
// show English site
exit();
}
}

Because "if (strpos($lang, 'de') === 0)" is true for both "[de-de] => 0.4" and "[de] => 0.2", so your "foreach"-loop outputs the German website twice, because you have nothing to stop it. Same goes for your three instandes of "en".

I hope I could make myself clear. It's kind of difficult to explain this without drawing a nice graphic :-)

46 . Jesse Skinner on February 10th, 2010

Jesse Skinner

@Manfred - you're absolutely right, the code needs to break the loop, either using break, exit, die or return. I left it to the imagination how to display the site, and make sure it's only displayed once.

47 . Michael on March 11st, 2010

Michael

Thank you for this perfect script :-)

I am developing an international Dating Site and need to determine the users language.

Currently it is only Danish. But I will implement other languages soon. (datingmatch.nu)

Thanks Again

48 . Marty on April 9th, 2010

Marty

Hi, this code looks almost exactly what i need, though the most important thing for me is to seperate UK vs US visitors. is it possible to identify en-uk vs en-us?

thanks for sharing! marty

49 . Lummo on April 9th, 2010

Lummo

Yes, but you need "en-gb" rather than "en-uk".

50 . marty on April 14th, 2010

marty

Fantastic, thanks Lummo, seems to work perfectly! in firefox en-gb vs en-us works but in most other browsers it needs to be en-GB or en-UK.

thanks very much!

51 . marty on April 14th, 2010

marty

sorry i meant en-GB, en-US

52 . marty on April 17th, 2010

marty

Did anyone notice that its not working across all browsers?

Well i thought it was working ok in all browsers after changing to uppercase, but that broke firefox.
so i put two options for every country, en-us and en-US but that meant 20 lines for 10 countries which is kind of messy

then i read up on php and replacing strpos with stripos seems to fix it, as stripos is case-insensitive.

would it be simple to use 'case' instead of 'if else' for matching? i eventualy want to have about 20 different regions and from what i read it is more efficient.

thanks!

53 . Michael on May 25th, 2010

Michael

I think the best way to test the except-lang header is using tool that can be modify headers and send it and view the response.

I use this free http tool the test the header... enjoy
http://soft-net.net/SendHTTPTool.aspx

54 . Richard Heider on June 1st, 2010

Richard Heider

I just founded the Facebook group '<a href="http://www.facebook.com/group.php?gid=121786987860982">Facebook needs multi-language awareness</a>' to push that issue there.

55 . Echo on June 10th, 2010

Echo

Very nice !

Thanks

56 . Keith on June 20th, 2010

Keith

One thing I don't get is why list individual varieties of a language in the Accept-Language header? I mean, I'm unaware of any variety of English I don't understand (well, Ebonics, perhaps), so I just specify the generic version. Here's my Accept-Language header's value:
eo,de;q=0.8,es;q=0.5,en;q=0.3
[I'm not even close to fluent in German and Spanish, but I figure getting webpages in those languages are a good way to improve my reading skills in them.]

About redirecting to a local server, like google.ae, I'm not seeing the problem. I'm sure you can still see the content in any supported language on any server, so viewing it on a more local server only makes sense, from a networking viewpoint.

57 . Richard Heider on June 20th, 2010

Richard Heider

@Keith. The content-negotiation mechanism is a general mechanism not just restricted to languages or the language a site is presented in.

What the site does with that info depends on the site and the context, e.g. the site might supply form letters or use a spell checker. It would then use the spelling appropriate for your region to serve you.

However I guess in practice you are right, that the generic language variant is sufficent.

58 . Felix on July 10th, 2010

Felix

This is very helpful! Things like that are not very complicated, but you saved at least one hour of my precious life by posting this one.

Thank you so much :)

59 . Dinar Q. on July 12nd, 2010

Dinar Q.

i also have(had?) made code for this:
example code:
http://qdb.tmf.org.ru/phpsinaw%28test%29-ici/accept-language.php
how it works: http://qdb.tmf.org.ru/phpsinaw%28test%29/accept-language.php
working in real site variant of similar code:
http://qdb.tmf.org.ru/minyasaganprogramlar/kukmara.ru/chat2/index.php (works at kukmara.ru/chat2/ ).
all these sites do not work at night nearly 23:20-7:00 gmt+4.

60 . goran on July 15th, 2010

goran

The biggest problem here might be users' lack of knowledge...I'm afraid most people don't know how to set up their browser in order to send appropriate accept-language header.

61 . mati on July 20th, 2010

mati

tanks!!

62 . Crazywater on September 14th, 2010

Crazywater

Hello,
your script is very short and elegant, do you license it under any particular license or is it just free to use for everyone?

63 . Jesse Skinner on September 14th, 2010

Jesse Skinner

@Crazywater - there's no formal license. Feel free to use this and any other code from my articles in your projects, but only at your own risk, of course.

64 . Crazywater on September 14th, 2010

Crazywater

Thank you very much! :)

65 . Rulatir on October 22nd, 2010

Rulatir

What about multiple languages with the same q value (like implicit q=1)? In this case their order of occurrence in the accept-language string encodes their relative preference. Unfortunately arsort() is *not* a stable sort, so the order will be lost. You should use a stable sort.

66 . Mike on November 10th, 2010

Mike

Thanks for posting this, very helpful :)

67 . Kathrin on January 22nd, 2011

Kathrin

Two relatively minor issues I see:

PHP has undefined results when sorting two equal values. (see usort docs)

Some clients do not specify q values, and trust the server to go with whatever was first. As such, it makes sense to retain the index and maintain it in the event of a tie. I am using usort to do this.

Second, it is possible to specify 0 for a q value, for cases where one wishes to explicitly state "do not give me this language". The regex doesn't handle this condition properly. Using '/([a-z]{1,8}(-[a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+|0))?/i' and checking isset ($matches[4]) seems to work nicely.

68 . James Suvestor on January 27th, 2011

James Suvestor

Thanks

69 . David Van De Walle on February 12nd, 2011

David Van De Walle

Why don't make things simple.. one line of code ..

$lang = substr($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2);

70 . Rick McKnight on March 1st, 2011

Rick McKnight

Great piece of code Jesse.
I looked into many other alternatives, but this seems the best by far.

Thanks a lot.

71 . Urahara san on March 21st, 2011

Urahara san

(sent my previous response before verifying the workings, sorry about that)

@Kathrin - you can make sort() stable by applying a (semi) Schwartzian transform before sorting.

Being more pragmatic I've done away with the need to verify the string format:


function getBrowserLanguages()
{
if (!isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
return array();
}
$langs = array();
foreach (explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']) as $k => $pref) {
// split $pref again by ';q='
// and decorate the language entries by inverted position
if (false !== ($i = strpos($pref, ';q='))) {
$langs[substr($pref, 0, $i)] = array((float)substr($pref, $i + 3), -$k);
} else {
$langs[$pref] = array(1, -$k);
}
}
arsort($langs);

// no need to undecorate, because we're only interested in the keys
return array_keys($langs);
}

72 . Giżycko, Mazury - forum on June 29th, 2011

Giżycko, Mazury - forum

Very helpful, thank you.
I'm about to create a site with 40 languages, so choosing the first language is crucial for me and my future visitors :)
You can see my beginnings at ujagody.pl
regards

73 . med on August 7th, 2011

med

Hello,
Is this script still supported? I'm stuck on using it. Or maybe it's not up to date as I'm not getting accurate results.

74 . med on August 7th, 2011

med

OK I finally got an idea and it works well now

75 . Abhishek on March 13rd, 2012

Abhishek

Thanks brother! The code works well for me.

76 . Mike on March 19th, 2012

Mike

I use this code: https://github.com/zendframework/zf2/blob/master/library/Zend/Locale/Locale.php#L582

77 . chris on April 6th, 2012

chris

Hi Jesse,
thanks for the post. It gave me some guidance on how to do this in C#.

78 . Falk on August 24th, 2012

Falk

hey people I made this:
if (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])){
$idiomes=array('es_ES','ca_ES');
$langs = array();
preg_match_all('/([a-z]{1,8}(-[a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', str_replace('-','_',$_SERVER['HTTP_ACCEPT_LANGUAGE']), $lang_parse);
if (count($lang_parse[1])) {
$langs = array_combine($lang_parse[1], $lang_parse[4]);
foreach ($langs as $lang => $val) {
if ($val === '') $langs[$lang] = 1;
}
arsort($langs, SORT_NUMERIC);
}
foreach ($langs as $lang => $val){
foreach ($idiomes as $idioma){
if (strtolower($lang)==strtolower($idioma)){
return $idioma;
}
if (substr($lang,0,2)==substr($idioma,0,2)){
return $idioma;
}
}
}
}

In my case I'm in a function witch checks first of all the $_GET variable (so the user can choose), then the $_SESSION, then the $_COOKIE, and at the end the browser.
Thanks for the code its useful and easily understandable.
Sorry for my English I'm a Spanish speaking German.

79 . Dan on December 9th, 2012

Dan

Dear Jesse,

thanks a lot for guiding me to the right direction. :-)

I improved your code by my tuned foreach-loop for the q factors:

<schnipp>
$ctr = 0;
foreach ($langs as $lang => $val) {
if ($val === '') {
$langs[$lang] = round(1-($ctr++/count($langs)), 1);
}
}
<schnapp>

This codes build its own priority from one to zero to handle those nasty language-couples like
'de-AT, en-US'
which collide with each other but i found in real life though. In this case the sequence matters.

Maybe that helps somebody.

Cheers
Dan

Comments are closed, but I'd still love to hear your thoughts.