Understanding the World with Models – A Conceptual Introduction

My grandfather took me to a model train museum when I was a kid. I remember walking (or running) around and following the trains as they chugged along. I was never a train enthusiast, but I did think the models were neat! Over 20 years later, it turns out models are kinda how humans make sense of the world.

A model is a representation of something. The small train above is a model of a real train just like a globe is a model of the Earth. A globe isn’t the real Earth, but it conveys some truth about the location of land and water on the real Earth. A 2D map is also a model of the Earth! But why do we have two different models; why not just have the one model?

Most models are simplifications, and so you lose accuracy, but the simplification should serve a purpose. In a globe, one simplification is true size, but it preserves relative location of land and water. A common 2D map, the Mercator projection, simplifies size and relative position/size (it distorts area that is closer to the poles). But it was historically very useful for navigation at sea.

A model isn’t only defined by absolute accuracy. Interpretability, the ease with which we can use the model, is also a core feature. Obviously, if a model is simply wrong, it doesn’t matter how easy it is to use or interpret. But if a model does have some use and isn’t too wrong, that’s good.

The idea of a model may still sound abstract, but you actually use models all the time in your daily life. Think about a close friend: how would they react if you gave them an unexpected gift? The person in your mind and how you think they’d react is a mental model that represents your friend. This model is just like the model train or a globe: you lose some information (it’s not as accurate as simply seeing what your friend does), but ideally your mental model covers some core details. Hopefully.

Comparing mental models of how the mind works is fun, but it’s harder to figure out who is “right.” Instead, most psychologists test statistical models. It’s still just a model, it just happens to be a mathematical model instead of a purely abstract or conceptual model. What makes it mathematical is specifying the numbers involved. So perhaps I believe that men are, on average, taller than women. Mathematically, I’d say: μ_MaleHeight> μ_FemaleHeight or equivalently, μ_Difference = μ_MaleHeight– μ_FemaleHeight AND μ_Difference> 0.

There’s a bunch of rules and more math statisticians and others have established about these models that allows us to test them. In this post, I’m not focusing on those specifics, but on something broader that is absolutely essential to properly using statistics: we have to judge how good the model is before trusting what it says.

Think of a map (a model of a city’s layout). When we’re judging a map, the core criteria is that stuff should be on the map where it would be from a birds-eye-view. A map of a city could include information about the height of buildings, if you wanted. It could even be presented beautifully. People can marvel over how advanced and fancy this new map looks. And yet, if the map tells you to walk down a street that doesn’t exist, it doesn’t matter how impressive the additional information is: it’s a bad model.

When we’re judging statistical models, there’s also important features we need to pay attention to that, frankly, a lot of people don’t even check. I really want to emphasize the importance of actually checking that your model is a good one. It really comes down to appropriateness.

What do I mean by appropriateness? Let’s say I made a model of diabetes to try and find a new treatment. Excitedly, I tell you I have found a cheap treatment using my model. With great foresight, you ask not only to see my key statistics (effect size, t-statistic, p-value), but my model. Look, here it is!

Choo-Choo: a good model for a train, a bad model for diabetes

Who cares if the insights from this model are “significant” or have “large effect sizes” or “are cheap.” It’s a really stupid model, and there’s no way this model train actually portrays anything about diabetes.

If you don’t check the assumptions of the statistical model you’re running for your data, you could be making just as absurd of a claim without knowing it. Unfortunately, statistical training in psychological science is kind of all over the place, I think. So it probably isn’t properly emphasized what violating these assumptions really do, how to even check them, and, especially for complicated (fancy) analyses, it’s honestly pretty hard. But that’s the nature of science: it’s hard.

In summary, please check you’re not modeling diabetes with a model made for trains. Of course, I’ll go over how to do this for things I eventually post, to the extent I can. I’m still learning a lot myself!

Observing Reality- Basic Concepts

In a first post about when is something true, I said something is true when it explains and predicts measurable/observable reality consistently. My goal here is to broadly discuss how complicated measurement/observation truly is. Throughout, you’re definitely getting my personal opinion on a lot of issues in psychological science: I often take a critical position. There are scientists who do not agree with me and some that do. But either way, I wanted to be clear that those are my opinions.

There are enormous obstacles to discovering “core truths” about human psychology, including philosophical questions about whether such truths exist. Here, I take the position there are some kinds of truth we can learn, and I also think the obstacles are not impossible to overcome. I do think that we would all benefit from carefully considering some basic issues more thoroughly. We must constantly challenge the validity of our constructs, a fancy word that basically means “concept” (e.g., happiness is a construct, and so is extraversion). In particular, we must be critical of how we measure our constructs and how much information those measures really provide. If we don’t, the best case is that we’re making an imprecise triangulation of the truth; at worst, we’re describing things that may be mathematically consistent but don’t relate to actual human psychology, the thing we claim to be studying.

For observations of simple gravity on Earth, observation may seem straightforward: junk falls. But the concept of observation can be complicated. Consider how exactly you should measure the time it takes for objects to fall:

Unit: are we measuring in seconds? minutes? jiffies?
Observers: is the measurement collected by a person on the ground with a stopwatch? the person dropping the object? a sophisticated laser? How do we calibrate the equipment? If I want to perform the same study, how do I ensure my equipment is calibrated the same way as the original? Do we have a universal clock that everyone agrees measures the “correct” unit of seconds? That’s not at trivial problem! In modern times, we have atomic clocks that vote on the time and determine it through consensus.
Outcomes: are we measuring time until first contact with the ground? What is “first contact?” Electrons don’t normally touch in day-to-day life. Touching becomes sort of a non-concept at small scales anyway. The range of electromagnetic influence is also infinite, so they’re theoretically always in some kind of “contact.” We’ll have to pick some boundary that counts as a finish line, probably. People may argue about how we select the boundary or whether the boundary is meaningful. This is more of an issue with psychological constructs. The boundaries of what we mean are very important.
Difficulty of observation: some events are rare. some are incredibly difficult to measure. Imagine needing to build miles of a particle accelerator just to hope to measure tiny, tiny particles we haven’t seen before. Maybe. In the psychological domain, observation is tricky because people often know they’re being observed!

For observations of gravity on the scale of human-sized reality, physicists have most of these issues solved. I think what often separates psychological science from hard sciences is that we have a lot of trouble with all these. And we never have perfect experimental control. I’m not allowed to assign my participants to random parent conditions at birth, and I can’t randomize their political beliefs or their religion either. Here are some other examples:

We’re assuming that a lot of abstract concepts are real and CAN be measured (e.g., that “love” is a thing, I can measure how much you feel it and it’s unique from the idea of liking).
Just because we can construct sentences we understand doesn’t mean they have a clear definition. E.g., I want to study “good conversations.” Easy to say, but what does that mean? Is the quality of a conversation determined by the people talking or by “objective” outside observers? Is it according to how much they like each other at the end? Whether they talk again later? The length of time they spoke? How do we factor in “depth” of the conversation? Does what counts as a “good conversation” depend on demographics (cultural changes related to generations, locations, shared history?). Maybe there’s no such thing as a good conversation. Imagine I have two people who went through a conversation and hated it. Maybe I can take two new people, have them engage in exactly the same conversation, and yet these new people will absolutely love it. So it’s really about the people, not the conversation.
There are standard units of time, but we lack truly standardized units for many of our phenomena (what is the standard unit of feeling happy? I vote for 7.5 smiley faces out of 7.5). This complicates comparing studies that use different kinds of measures. It also doesn’t help that we know people respond to scales differently (asking on a 1 to 7 scale can produce quite different answers than 1 to 100).
On that point, math doesn’t always make sense for our measurements: is a 10 on a happiness scale 2 times as much as a 5? Rarely, maybe never. Does one person’s 10 equate to another person’s 10? Probably not. So person A’s 5 may actually be “better” than person B’s 6. Even within a person, how consistently do they use the scale? Is my 5 on Monday truly the same as my 5 on Saturday?

That’s just a sampling of what makes psychological science hard, and some people think it’s impossible. As someone who does psychological research, I currently don’t think it’s impossible, but I’m sympathetic: we haven’t always done a good job tackling these first issues. In fairness, they are really difficult to solve. But still, as the people claiming to do psychological science, it’s literally our job to solve them.

If you’re into this kinda thing, these problems actually all relate to core statistical and research concepts.

Content and Construct validity: do our observations (measurements) actually capture the construct/concept we are trying to capture? Is it missing pieces of it? It would be weird, for example, to claim I’m measuring weight by only asking for your height. There actually is a relationship between weight and height, so at least I’m not asking for something totally irrelevant like your favorite color. But clearly I’m missing the most important aspect of the thing I want to study, and there are ways in which height relates to stuff that weight doesn’t. By using imperfect measures of the thing we really mean, we may be misled. Personally, I believe this is a big problem in social psychology. E.g., we study “culture” mostly by asking for people’s ethnic ancestry or country of origin. This doesn’t necessarily mean the results are invalid: even mediocre measurements still provide “signal” to the truth, provided you gather a lot of data. But I’ll have to talk about that later.
Test-retest Reliability: if you say your personality on some scale is a 7 today, by the theory of personality, you should give a very similar answer tomorrow. And five weeks from now. And two months from now. Similarly, a scale that measures “weight” but randomly adds or subtracts 0 to 100 pounds is not a very good scale. It would be hard to interpret a single measurement (though you could still interpret the average of many measurements). The same applies to our measures. They need to be consistent to be interpretable.
Internal reliability: If we’re averaging a bunch of stuff together in some way, do these things actually go together? E.g., if I’m measuring your height with two different measuring tapes, they may disagree a little (measurement error), but I’m pretty confident they’re measuring the same basic concept (distance), and therefore it is fair to average the two answers to together into a single (ideally more accurate) score. Internal reliability, however, is a mathematical kind of reliability. We calculate how reliable our scales are. Mathematical reliability doesn’t mean the scale is a good scale. Sometimes people forget this.

I can only speak about social psychology, but we all tend to study our own things. So we have similar constructs but we call them different names and use different measures. Sometimes our measures have passed some test of validity and reliability, but sometimes they’re still just convenient instead of accurate (my pet peeve is treating race as an indicator of culture in the US; another pet peeve is that I dislike the phrase “pet peeve”). With the proliferation of technology and computing, I think we can do a lot better than most people realize. It’s just a matter of introducing these advances to a field that isn’t primarily made of statisticians or computer scientists.

Maybe someday I’ll actually take on the job of sifting through our myriad measures and seeing what’s what myself. I actually think that’d be pretty neat. And as a final thought, even if I came across as very critical, I actually am fairly optimistic that these problems can be solved. Whether they will be is up to us (the scientists). And I think we are improving over time. Just like there’s a lag between scientific advances and communication to the broader public, there’s a lag between advances within psychological science and communication to other psychologists.

When is something true?

What color is the sky on an average, cloudless day? Blue is a fair answer, but how do we know that this is true? How do we know anything is true?

Maybe it’s strange to start off with a philosophical question, but it’s a vital first-step that helps bridge the gap between thinking about human psychology and psychological science. At the heart of science is the method used to decide what’s true. For anyone who has an introductory knowledge of epistemology (philosophy of knowledge) or philosophy of science, well you probably know more than I do. But I figured I should start here.

Science, like anything else, has some assumptions. One: the effects we observe in the world have causes (cause and effect exist). Two: observation (and measurement) gives us information about cause and effect. Three: the relationship between cause and effect has a stable pattern. So putting it all together, something is true when it explains and predicts measurable/observable reality consistently. Most accurately, it’s the closest approximation of the truth until a better idea improves or replaces it.

There are alternatives way to view the world, but the most scientific aspect of psychology is its use of the scientific method and the mutual understanding among scientists about what constitutes knowledge. If an idea fails to predict patterns in measurable reality, it just doesn’t pass the test. If a new idea makes better predictions, explains more or fixes inconsistencies, it replaces older ideas.

So how can we apply these concepts to the color of the sky? One of the first tasks is to determine what exactly we’re observing/measuring when we ask the question. There’s the physical aspect of the light, but that’s not what we really asked. We asked for the color: a subjective experience. How do we establish “truths” about something subjective?

There’s a history of debate about what psychology should study. There was a time the behaviorists won most arguments: psychologists should exclusively study what is observable externally (behavior). But modern psychology deals with the ABC’s of the mind: affect (feeling/emotion), behavior and cognition (thought). And it turns out there ARE patterns in subjective experience insofar as we can measure them. Our fully colorblind friends see the same dominant wavelengths of light in the sky (450-485 nm), but they lack subjective perception of color. Even if all we did was ask people to “name the color you personally perceive the sky to be,” we would find at least two groups of people: those who call it blue and those who do not.

This is the type of knowledge we can find about subjective experience: kinds (categories) of people, dimensions of experience (e.g., light intensity from dark to painfully blinding), correspondence between cause (wavelength of light) and effect (color perception). Inconsistencies merely represent research opportunities: is there another group of people we aren’t accounting for? Is there another factor that shapes color perception? Is it the perception that is truly different, or is it the way people communicate their perception? Is there a better way of thinking about color perception? Or perhaps the way we measure it needs to be improved?

So when is something true, and is the sky blue or not? It’s true in psychological science when the data shows a consistent pattern, preferably in multiple sample groups collected in a rigorous way. Interpreting the results, however, is tricky. It’s not like we can say “the sky is definitely blue” or “it definitely is not.” Instead, we have to talk about kinds of people and dimensions of possible perception. The sky is blue-ish for most color-seeing folk, but some see it as light grey. Does this complication mean there was nothing learned? I would say no; we definitely learned something about human minds. It may be complicated, but the answers should be complicated because people are complicated.