| Svelte Hacker News

points by tokenadult 13 years ago

From the article: "Using Survata, a web-based market research service"

Well, that's the problem that caused the crazy results that most comments here are mentioning. That's a voluntary response survey, which means that it almost surely doesn't represent the general population. (By the way, my first answer to a question like that would be "Huawei," but then I could think of plenty more, including brands that are only sold in China, which I have visited.)

Here's a FAQ about the junk data from voluntary response surveys: As I commented previously when we had a poll on the ages of HNers, the data can't be relied on to make such an inference. That's because the data are not from a random sample of the relevant population. One professor of statistics, who is a co-author of a highly regarded AP statistics textbook, has tried to popularize the phrase that "voluntary response data are worthless" to go along with the phrase "correlation does not imply causation." Other statistics teachers are gradually picking up this phrase.

-----Original Message----- From: Paul Velleman [SMTPfv2@cornell.edu] Sent: Wednesday, January 14, 1998 5:10 PM To: apstat-l@etc.bc.ca; Kim Robinson Cc: mmbalach@mtu.edu Subject: Re: qualtiative study

Sorry Kim, but it just aint so. Voluntary response data are worthless. One excellent example is the books by Shere Hite. She collected many responses from biased lists with voluntary response and drew conclusions that are roundly contradicted by all responsible studies. She claimed to be doing only qualitative work, but what she got was just plain garbage. Another famous example is the Literary Digest "poll". All you learn from voluntary response is what is said by those who choose to respond. Unless the respondents are a substantially large fraction of the population, they are very likely to be a biased -- possibly a very biased -- subset. Anecdotes tell you nothing at all about the state of the world. They can't be "used only as a description" because they describe nothing but themselves.

http://mathforum.org/kb/thread.jspa?threadID=194473&tsta...

For more on the distinction between statistics and mathematics, see "Advice to Mathematics Teachers on Evaluating Introductory Statistics Textbooks"

http://statland.org/MyPapers/MAAFIXED.PDF

and "The Introductory Statistics Course: A Ptolemaic Curriculum?"

http://escholarship.org/uc/item/6hb3k0nz

I think Professor Velleman promotes "Voluntary response data are worthless" as a slogan for the same reason an earlier generation of statisticians taught their students the slogan "correlation does not imply causation." That's because common human cognitive errors run strongly in one direction on each issue, so the slogan has to take the cognitive error head-on. Of course, a distinct pattern in voluntary responses tells us SOMETHING (maybe about what kind of people come forward to respond), just as a correlation tells us SOMETHING (maybe about a lurking variable correlated with both things we observe), but it doesn't tell us enough to warrant a firm conclusion about facts of the world. The Literary Digest poll

http://historymatters.gmu.edu/d/5168/

http://www.math.uah.edu/stat/data/LiteraryDigest.pdf

is a spectacular historical example of a voluntary response poll with a HUGE sample size and high response rate that didn't give a correct picture of reality at all.

When I have brought up this issue before, some other HNers have replied that there are some statistical tools for correcting for response-bias effects, IF one can obtain a simple random sample of the population of interest and evaluate what kinds of people respond. But we can't do that here on HN.

Another reply I frequently see when I bring up this issue is that the public relies on voluntary response data all the time to make conclusions about reality. To that I refer careful readers to what Professor Velleman is quoted as saying above (the general public often believes statements that are baloney) and to what Google's director of research, Peter Norvig, says about research conducted with better data,

http://norvig.com/experiment-design.html

that even good data (and Norvig would not generally characterize voluntary response data as good data) can lead to wrong conclusions if there isn't careful thinking behind a study design. Again, human beings have strong predilections to believe certain kinds of wrong data and wrong conclusions. We are not neutral evaluators of data and conclusions, but have predispositions (cognitive illusions) that lead to making mistakes without careful training and thought.

Another frequently seen reply is that sometimes a "convenience sample" (this is a common term among statisticians for a sample that can't be counted on to be a random sample) of a population offers just that, convenience, and should not be rejected on that basis alone. But the most thoughtful version of that frequent reply I have previously seen in online discussion did correctly point out that if we know from the get-go that the sample was not done statistically correctly, then even if we are confident (enough) that HN participants are young, we wouldn't want to extrapolate from that to conclude that the users of any technology site are young, or that users of the Internet as a whole are young.

On my part, I wildly guess that most HNers are younger than I am in part because this kind of poll recurs often on HN. Other preoccupations of younger rather than older people make up frequent topics on HN, and I've tried looking for signs that there are large hidden numbers of old participants here without finding many.

dsugarman 13 years ago

Thanks for your response. To account for this, we also asked the same question about Japanese companies. The results, as you can see, were very different and show clearly that there is low Chinese brand recognition in the US.

I invite you to ask your friends & family. Before running this survey we casually polled acquaintances. The results we found were lower than 1 in 20.

awenger 13 years ago

Survata co-founder here. To clarify, Survata is not a voluntary response sample. Voluntary samples often have a bias because the individuals who choose to respond are those with strong feelings on a topic. For our surveys, the primary incentive is access to premium content - and not a desire to express one's opinion on a topic. We aim to have a respondent pool that truly represents the population.

rmc 13 years ago

Y'know, it just occured to me that many democracies use voluntary responses, since they only count the votes from people who go out to vote.