Saturday, February 6, 2016

Linguistics of statistics

I know what you are thinking.  Why? Right!? So the idea of mixing math (or maths as the British say) with language is heresy. However, math is a language albeit one most people would rather not have to learn.  And fascinatingly understanding the linguistics of it is an understanding of the subject, in this case statistics. Like all languages if you understand the definitions of the words and the concepts behind them, then the application becomes much easier.  And this holds true for any mathematics. 

So being a budding linguist the challenge of statistics has become a language learning process and not a 'math thing'.  I like math anyway, so it's not like it's a killer idea for me.  But from a research perspective viewing statistics linguistically has merit.  If you understand the meaning of what your are looking at and what you are looking to do with it then the task doesn't seem as daunting.  It's about communicating data and ideas.  Translating data into understandable terms and finding meaning in what has been collected.  Sounds like a linguistics thing to me. 

So like any language we start by understand some basic vocabulary words.  I like the idea that it is descriptive as well as analytical, I can cope with both of those ideas.  It deals with population samples and variables.  See, not so bad, right?  Variables can be nominal (categorical), ordinal which are categories that are ordered, or scale which are continuous and the numbers mean something like age.  Scales can be interval (degrees C or F) or ratio (weight). These determine the statistical technique that can be used to analyse the data set. 

Slightly harder to process are the ideas of mean(average), median(middle), mode(most frequent), standard deviation (how far from the mean), skewed (probably like my view of this concept), histograms, bar charts, pie charts, box plots, scatterplots, normal distribution(you know, the bell curve), z scores (no not worse than failing...), outliers (yes those that are out of the group and causing you trouble), confidence interval (how confident you are that your data falls within the norms), standard error..... 

 See, vocabulary. The vocabulary is then reduced to symbols and the symbols placed in equations (sentences) that you then find the numbers that they represent and manipulate to find answers. You get things like the answer to life the universe and everything which is, according to the Hitchhiker's Guide to the Galaxy, 42.  However they didn't give the confidence interval or the standard deviation... nor did they report if they used linear regression or a chi square test. Not sure how statistically significant that number actually is....

So I will work hard to learn the linguistics of statistics so that I can use statistics to analyze linguistics.  I don't know about you, but I'm excited.  Care to join me? 

1 comment:

  1. 42 - your inner nerd is showing!

    I too figured math out while speaking Spanish. Once I put it as a language, it became a whole lot easier to do. I just wish I had figured it out a lot sooner. I may have been an engineer instead!

    ReplyDelete