playing with numbers

Sunday, May 6, 2007

Beets, Al Gore, and other Drama (sort of)

Nicole made an interesting post recently about the arguably horrifying lack of scientific knowledge/understanding among Americans (e.g. lack of belief in evolution and global warming) and the debate among scientists about what can be done about it. Read her post first for a good summary of what the basic ideas are to help this problem (i.e. teach science in detail which may bore some, or simplify it in order to increase accessibility at the expense of losing some depth of substance).

I think (perhaps somewhat pessimistically) that the problem of the public understanding science, and maybe more importantly, the philosophy of science, isn't something which can ever fully be resolved. It seems to be human nature to want to compartmentalize ideas into definite categories - this is right, this is wrong, this is true, this is false - according to what we've been brought up with, what our political party thinks, or what we want to believe. The public doesn't want to hear that something is under investigation and there are a number of different theories, all supported by different kinds of evidence, and that the true answer is a complex integration of them that we haven't quite figured out yet. People are uncomfortable with uncertainty and it's often easier to ignore other points of view and just focus on what makes yours "right." And if you and yours are right, that means someone else is wrong, which is good for self-esteem, group cohesion, etc.. I think people like to define an Other, so they have someone to elevate themselves and their group above or to blame things on, creating an us vs. them mentality that only serves to alienate differing viewpoints and make them more extreme. It's easy to maintain your viewpoint when you're fighting against "godless liberals" or "closed-minded Republicans", not so easy to hover somewhere in the middle, belonging to no group and unable to sum up your standpoint in a 10-second sound bite.

The other day while out to dinner I mentioned something about how I'd gone to see Al Gore's presentation last week. Now, I don't particularly like the guy or agree with all of his politics, but I appreciate what he's doing in terms of raising the public consciousness about climate change. Anyway, a friend of mine immediately said "oh I don't believe in that global warming stuff" - spoken in the same tone that one might espouse distaste for beets, or country music, or Tuesdays. I was baffled, considering I don't think she's had a science class since high school and freely admits that she knows virtually nothing about it. Unfortunately I think this is the way a lot of people come to their beliefs about important issues: they base it on what their parents/church/friends are telling them, or on their opinion of the messenger, or on their own speculations about how the world works. And while we could definitely be doing better with science education, I don't see those fundamental aspects of humanity going away. Evolution didn't prepare our minds for scientific inquiry, it equipped us to learn from our families and social groups. Our tendency to furiously guard our pride and reluctance to admit being wrong doesn't help either!

So, I don't really have a solution to propose. Obviously we should be doing something about the way science is taught in this country since there are countries where people are more scientifically literate. So it is possible to make substantial improvements - it just will never be 100% of the population that has the necessary understanding. But hey, in science you never really get perfect results anyway, right?

Thursday, April 26, 2007

Analyzing training data

So one of my ideas throughout the semester for the final project has been to see what kinds of analyses I could perform on the training data I’ve accumulated over the past few years. There was a comment awhile back asking about what kind of information is in these training logs and I’ve been meaning to make this kind of explanatory post for awhile now. At this point I consider myself pretty darn good at making insanely and probably unnecessarily complicated/intricate Excel logs, which seems to be an endless source of amusement and good-natured mockery from my friends :)

Anyway, the level of detail (and the consistency) of these training logs varies drastically over the past two years or so; the more recent they are, the more data I have tended to record, and it’s only in the past semester that I really consider my spreadsheets to contain “enough” data, just for my personal use to look back on. What degree of statistical analyses I’ll be able to perform on the rest remains to be seen.

Here is what I currently keep track of--

For all cardiovascular work:

sport (swim, bike, run, other)
duration
distance (yards, miles)
pace (min/100 yards, mph, min/mile)
RPE (rate of perceived exertion)
average heart rate
max heart rate
skill trained (endurance, force, speed, muscular endurance, anaerobic endurance, power)
workout done (e.g. 3x8/4r = 3 reps of 8 minutes high intensity followed by 4 min recovery)
notes on the workout: can be details about lap times (to get an idea of changing pace & HR throughout), any nagging pains or developing injuries, general comments/thoughts

Since I focus on triathlon, I have more detailed information about the following:

SWIM: stroke used, pool or open water
BIKE: bike ridden, route, hills (1-9), wind (from weather.com)
RUN: shoes worn, surface (dirt, pavement, track, etc), hills (1-9)
LIFT: sets, reps, poundages for each exercise done

I also summarize the major information for each sport for each week and make a few graphs: total weekly volume (with breakdown for each sport), total weekly distance per sport, and the overall breakdown of how much time is spent in each discipline.

If you’re not completely convinced I’m a lunatic by this point, I present you with the following screenshots!

Different colors correspond to workouts of different types (e.g. endurance, speedwork) and allow me to plan weeks out ahead of time without having to decide exactly which specific workout I will do, just by filling in an appropriate mix of colors in the boxes for all the sports on various days. I usually have the runs planned out a few weeks ahead of time since my focus has been a May marathon for which there are specific weekly mileages and distances I must make; I, then fill in the other sports at the beginning of the week, making adaptations depending on daily commitments. As you can see, the past week and a half has not gone as it should have...

An example of the sport-specific information, in this case running.

Craziness. But it's useful.

You may notice the uh, mild decrease in activity on the most recent week, where all I did was one weight session the entire week. Unfortunately I started having major iliotibial band problems during runs and have had to take some time off so that it doesn’t turn into a recurring injury. Neglecting exercise is stressful enough but it’s coupled with the likely prospect that the marathon I was training for all semester is not going to happen for me. A thought that’s more than a little devastating – but, better to lay off now and miss the race than to ignore the pain and injure myself permanently for the rest of the summer. With that in mind, I’ve been trying to think of some questions that I could ask about these data in order to use for an analysis. Here are a few:

What factors predict incidence of joint/tendon pain or injury?
How does weekly training volume affect paces?
Does running surface/shoe type affect pace?
What kind of relationship exists between heart rate, RPE, and pace?
How does sleep impact speed, distance, RPE?
How does daily volume relate to stress?
Which weightlifting exercises lead to greatest strength increases?
How much lifting volume leads to the fastest gains?

I don’t know if I have the consistency of data for a long enough period of time (e.g. a year) to fully answer these questions but if I can figure out a way to model some of these things, then in the future I can always go back and enter the increased data sets to see if the answers change. It would be really interesting to use this to find out information that would be extremely relevant to me. Combining personal life and intellectual interests so closely is definitely a unique opportunity!

Tuesday, March 27, 2007

On diligence

Today's class discussion, particularly the points on recording & entering study data in a timely manner, was very interesting/relevant to me. While I haven't done a lot of research, I do keep track of a lot of "data" (some of it numeric, some of it just info that I need) in my daily life. A lot of this stuff is scribbled down onto scrap paper, into a notepad file, or onto my computer's desktop (more on that in a second), and unfortunately, much of it is forgotten. Of course, I know that I should transfer into whatever its final form will be (be it in a spreadsheet, Quicken, my Amazon list, an email, assignment, blog post, my cell phone) as soon as possible, but rarely do so. It was good to hear this reiterated.

So, I decided to clear up some of the information that's clogging up my computer right now. I have a folder on my desktop devoted to 2.5 yrs worth of notepad files, ranging all the way from from "$ spent updat" (which not only has records of money spent, but some weightlifting workouts and part of a to-do list??) to "YOU ABSOLUTELY MUST BY 11-05" (in reference to some magazine subscriptions I had to cancel to avoid getting charged). However, there are 318 files. So... I decided on an easier task to tackle first.

I have a program that lets me put virtual post-it notes on my desktop. These notes tend to accumulate and the problem is exacerbated because whenever I have too much into cluttering up the space, I just stack all the notes on top of each other and forget about the bottom ones. So I decided to clean this up and sort all the important data. After separating all of them, here's the diversity of information I found:

Distances to a bunch of landmarks on the levy bike path that I measured with Google Earth.
Training paces for various kinds of running (easy, tempo, long, speedwork, etc) calculated based on my mile time.
Data for some stuff I sold on ebay (listing fees, final value fee, paypal fee, shipping, price charged, profit).
No less than 48 (!) songs to download. This list has been in progress for a long time.
Information about a job on craigslist.
A list of features I was looking for in a heart rate monitor (despite having already bought one that satisfied these requirements nearly two months ago).
A to-do list (only one! a good step for me!)
Some motivational training quotes.
Six identifying phrases/sentences from different song lyrics so that I could google them to find out what the songs are, for eventual download.
A library call number. To what? Don't know.
Two phone numbers. Whose? Don't know.
Some order numbers. For what? Don't know.
Random workout information from ages ago, identified in time only by the day of the week since I suppose I expected to have it all entered in my spreadsheet well before I forgot what week it referred to.
A few websites I heard/read about in some context and planned to visit. Almost a year ago.
The titles of a few studies I wanted to look up.
A list of all the categories of information I planned to include in my excel workout log.
The numbers "12" and "6166537". Ok...

Kind of ridiculous. But after a bit of work I have it pared down to four notes: a to-do list, stuff related to exercise, stuff related to music, and a quote that I like to see.

If you stay on top of things - entering data as soon as you get it - you're more likely to maintain that, whereas if you make a habit of jotting down quick notes in different places without timely follow-up, that's something you're going to stick to as well. The conclusion I've come to is, if the information is important to you, and you're taking the time to monitor it - whether it be for research or just personal stuff - then it should be worth the extra 2 seconds (or even, god forbid, 2 minutes!) to make sure that it's in a format that will be useful to you. Now hopefully I'll be able to abide by this philosophy!

Monday, March 26, 2007

virtue and vice

So, keeping up with a blog is clearly not my forte. I am definitely going to make a concerted effort to stay more on track with this. The stupid thing is, I had tons of ideas for what I was going to make entries about, yet never actually did anything about it because I felt my thoughts weren't entirely fleshed out. The smart thing is, I did write down lots of notes about blog ideas so that I wouldn't forget about everything I'd come up with. The stupid thing is, I put these notes (along with a bunch of other important stuff for school) into my checked luggage over spring break. Which the airline proceeded to lose for EIGHT days. My bag was apparently sent to Birmingham, Alabama, because that's "close enough" to Burlington, Vermont. Thank you, Northwest. The smart thing is - well, not so much smart due to anything on my part so much as dumb luck - I finally got the suitcase back on Saturday (I wonder what portion of people never get their stuff back? Or how your odds of it being returned to you decrease with each passing day?). So I now have all my ideas back! But the stupid thing is, it's a little late for the kind of extended entry I was planning. So it'll have to wait. But I'll leave you with this interesting link that I found while researching some data for the take-home exam:

http://addictedtor.free.fr/graphiques/

It's a collection of graphs made using R, and the diversity and complexity of some (or most) of them is astonishing. I never would have thought it possible to make figures like those with R. Odd how a program that appears so "simple" (as compared to even something like Excel) is so powerful given the right knowledge about how to use it.

Monday, March 5, 2007

improbability

The summer after I graduated from high school I attended a summer program which is known among its alumni for having a number of crazy traditions, including some strange celebratory antics whenever someone has a birthday. The problem during my year? Not one person in the camp had a birthday the entire time. It was a science-oriented camp, so of course somebody had to calculate the probability of this happening. I don't remember exactly what it turned out to be (other than impossibly small), but during Thursday's class I realized that with R, I could figure it out in a matter of seconds.

For simplicity's sake I assumed that every day of the year is equally likely to be someone's birthday (which is not true, but close enough for this estimate). There were 97 delegates, plus 29 staph (not a typo - their enthusiasm is "infectious." ha ha.), for a total of 126 people. The program lasted almost four weeks, or about 26 days.

So, the probability of 0 occurrences of an event when there are 126 trials and the odds are 26/365 is:

pbinom(0,size=126,prob=26/365)

Which turns out to be:

9.041928e-05

Or, less than 1 in 10,000!

Monday, January 29, 2007

Another possibility

I thought of another idea for the independent project: Somehow analyzing the multitudes of data in the training logs that I maintain religiously. I'm not really sure how this would work, or even if it would since the data is not from an experiment per se. I guess I will have to wait until I have a better idea of the types of statistical analyses we will be learning about; before then I don't really know what I am talking about... But regardless, it's an interesting idea and I would definitely have fun with it. I have already spent plenty of time making graphs to chart progress, finding formulas online to predict performance, etc.; at times probably to the extreme where it's too much about the numbers and not enough about the experience! Overall though, I enjoy having a concrete reminder of what I have accomplished, preferably in a form that can be analyzed (i.e. numerical, not descriptive). So I think that learning some new and different ways of looking at data will be very useful to me in the future.

Thursday, January 25, 2007

R and C++

I took a C++ programming course in high school and although I don't remember too many details from it, I think it will help me when learning how to use the program R. I've already noticed a number of similarities in the way you communicate with the computer and how it interprets commands: i.e. the logical operators (<=, !=) are the same, along with the comparison symbols (|, &). More of what I knew how to do four years ago is also coming back to me, like defining and working with variables. I also think that just the experience of having to convert my own thinking into a type of logic that the computer will understand will help me in working with R.