How to write an amazing spreadsheet
How to do statistical computing?
There’s no way around it.
I’ve been trying to figure out how to make my own spreadsheet for a while now, and I finally figured out how.
This post may take a while to get up to speed, but the steps are pretty straightforward.
The goal is to build a spreadsheet that can handle all sorts of data, and then compare the output from that spreadsheet with the data from other, similarly-situated, statistical tools.
I’m hoping to write up some more posts about this as I learn more.
For now, here’s how you can start.
Step 1: Find the data source for your spreadsheetStep 2: Create a spreadsheet to represent the dataStep 3: Get the data you want to compare the data withStep 4: Find a formula that produces the desired outputStep 5: Combine the output of the two calculationsStep 6: Repeat steps 5-6Step 7: Repeat Steps 4-6 step-by-stepStep 8: Calculate the average, median, and mode of variance of your dataSource: Wired, via Gizmodo The basics of statistical computing aren’t exactly new, and they’re not necessarily a new idea.
The basic idea is to use statistical tools to find correlations between various types of data.
In particular, you might want to find relationships between the things you know about your data and those you know nothing about.
If you’re interested in how statistical computing works, there’s an excellent Wikipedia article on the subject.
The problem with the original idea is that you don’t have any way of knowing what the results of any one statistical tool are going to be.
There’s a lot of uncertainty, and the only way to get a general picture of the state of the world is to ask some very general questions, like “What’s the likelihood of X happening?” or “What does the median of the distribution of X’s values say about X?”
There’s nothing you can do about that, except make a spreadsheet of your own and compare that with the results from some other, similar tool.
It’s like trying to find the meaning of “the average,” but instead of measuring the average of some arbitrary data set, you measure the mean and median of some random data set.
This isn’t particularly difficult, and you can probably get away with doing it with just a few tables and a couple of random numbers.
But what if you’re using statistical tools for something more specific?
Say you’re doing something like running a regression model.
You might be interested in finding correlations between certain characteristics of your users or their friends, and your data might have some other characteristics, like the age of your target user.
Or perhaps you want some information about your target users’ relationship to their parents, so you can get a better idea of what your users are like.
It turns out that there’s a whole slew of different statistical tools that can be used to do all of this, and that’s what this post is about.
We’re going to build one spreadsheet, and we’re going in order.
Step One: Find what kind of data you’re looking forStep Two: Create the spreadsheetStep Three: Build the spreadsheetYou can find out the names of the data sources and the statistical tools you need by searching for the keywords “statistical computing,” “computer programming,” “software development,” “statistics,” “deterministic models,” “models,” “analysis,” “experimental methods,” “predictive modeling,” “machine learning,” “model-driven computing,” or whatever else you want.
You’ll find that you can find similar tools in most places online, but I’ll assume that you’ve got a spreadsheet somewhere on your computer.
Step Four: Use the spreadsheet to calculate the mean, median and modeOf course, if you’ve made the spreadsheet and you’re not sure how to use it, there are many websites that will show you how to get the data and figure out what’s going on.
I used Statcast for the purposes of this post, and for some of the calculations.
There are also plenty of other websites that have a great guide for building a statistical spreadsheet, like this one.
If the spreadsheet doesn’t come up, try opening the spreadsheet in Google Docs.
Step Five: Compare the resultsFrom there, you can compare the results against other tools to see how close they are to your desired output.
The best way to do this is to do a quick spreadsheet comparison.
In my example, I used the standard plot, which gives you a nice overview of the output and gives you the raw data that you’re comparing against.
Then, I did a quick test of the median, which is a more specific formula that tells you how much of your output is likely to be due to chance, and how much is likely due to randomness.
You can find more information about the standard plots on the Statcast site.
Once you’ve done that, you should have something that looks like this:Now, if your spreadsheet doesn’st