I've seen languages ranked by many diverse criteria, including number of speakers, economic power of countries where the languages are spoken, and so on. But I've always felt like there were large numbers of people being forgotten and important details being overlooked.
How many people speak the language?
Population figures are relatively easy to find, as are data indicating the official language(s) of each country. However this isn't always a very realistic measure of a language's speakers.
For example, Russian is no longer the official language of many former-Soviet countries, but it is still heavily spoken throughout the FSB. If one were to compile a list correlating populations to official languages, Russian would be extremely under-represented for native speakers... and that's not counting those who speak it as a second language.
When making judgments based on the number of speakers of a language, it is important to use numbers that fairly reflect those speakers, both native speakers and those who speak as a second language, regardless of the government of the country where they live.
How wide-spread is the language
But number of speakers isn't, by itself, a fair measure either. Close to 1 billion people speak Mandarin Chinese, for example — a figure which eclipses English, the next most populous language, by a factor of two — but English is spoken heavily in 33 countrys, and to a lesser extent in another 82, whereas Mandarin is only measurably spoken in 5.
The significance of such figures is that if you learn Mandarin as a second language and then diplomatic relations break down with one of the countries where it is spoken, you've just lost 20% of your potential audience, whereas if you learn English as a second language, losing opportunities in one country only costs you 3% of your audience.
Naturally, 20% and 3% are over-simplifications which assume an equal distribution of speakers in each country, but it's enough to make the point clear: population figures for official speakers isn't enough.
Other data, such as GDPs of those countries, are interesting predictors of opportunities both for business and tourism, but GDP is necessarily flawed because it is tied to particular countries, so it can't accurately reflect native speakers of non-official language. And it completely ignores second-language speakers.
But more importantly, the choice to use economic data as a criteria makes some rather broad assumptions about the ways in which language is used.
A good example of this is Japanese. Economically, Japan ranks very high. So based on GDP, learning Japanese looks extremely useful. But Japanese is only spoken in Japan, and almost nowhere else on earth, so knowing Japanese would do nothing to increase your tourism prospects (beyond, obviously, travel to Japan).
And worse, it puts all the eggs in one basket, so to speak. If relations with Japan ever broke down (however unlikely that is), there is no fallback use for that language skill.
So any language ranking that's going to be useful needs to fold in all of these considerations. A language shouldn't be ranked too high if it has limited use, but it shouldn't be ranked too low if it has a lot of speakers. And it shouldn't be arbitrarily rated based on the economical conditions of any country where it is spoken.
Good data, if old
Recently, I came upon a post from Christopher Nelson which seemed to address these concerns. When I asked him where he got his data, he pointed me to this web site, which uses data from language surveys from 1997-1999. With the exception of the data being a bit stale, it's the most relevant ranking I've seen so far.
Notably missing are Turkic languages. My assumption is that the fact of being named differently (Turkish, Uzbek, Uyghur, Kazakh, etc) makes it difficult, or at least non-obvious, to combine them statistically. If 10 different Arabic dialects can be combined, it seems to me that Turkic languages should get that advantage as well.
Nonetheless, in spite of that oversight and the age of the data, I think the resulting list is pretty good, and surprisingly close to my expections, based on my complete world-traveler language list from a few months ago.
It is encouraging for me to see German make this list, as its notable absence on my own list drew a lot of comments. Of course my list had a slightly different motivation, including an increased value for keeping the list shorter.
Interestingly, with the exception of Korean (which is missing from this list), this looks a lot like the list of top 10 languages used on the web, which would seem to suggest that any polyglot who spoke these 10 language could effectively have an audience of more than 80% of the entire internet!
Want to see my favorite language resources and courses?
I listed them here.