“The Butterfly Effect” of Small Data Errors

Do you remember that terrible movie with Ashton Kutcher in it? Sorry, do we need to be more specific?

The movie was based on the popular “Butterfly Effect” theory, which has become the other pop-culture digestible form of Chaos Theory from a figure NOT mauled by a T-Rex. The theory you might know goes like this: A butterfly flaps its wings on one side of the world, and it creates a tornado on the other side of the world. A small change somewhere leads to a large change somewhere else. Pretty straightforward, right? Unfortunately, that’s just about the opposite of what the theory actually posits.

As this Boston.com article points out, this interpretation “…get his insight precisely backwards. The larger meaning of the butterfly effect is not that we can readily track such connections, but that we can’t.” Who’s the “he” in this quote? That would be Edward Lorenz, the mathematician who published the paper that all this stemmed from: “Predictability: Does the Flap of a Butterfly’s Wings in Brazil Set Off a Tornado in Texas?

Your Data is Not a Tornado

The point of the original paper is that the weather is so complex, with so many factors, that it is impossible to predict. The tornado could have stemmed from the flap of the butterfly wing; it’s just something we can’t know for sure (and neither can meteorologists). To come close to accurately modeling the weather, we’d need great leaps in computing power. Some estimates put this at least 150 years away, if ever, and that’s taking into account how far we’ve come.

Luckily, your data is no where near as complex as the weather. With the analytics available today, you can see exactly what flap of a butterfly led to your tornado. The traceback here is entirely possible. Here’s a screenshot of a data error detection example:

From here, you’re able to see when the error was created, who created the error, what the error is, and on what job order. Of course, some of the fancier tools available can also algorithmically tell you how important the error is (high, in this case), provide a “best guess” from your historical data (20%, here), and provide the ability to send a note to the owner to fix the data error. Soon, you’re on your way to seeing this:

We’re big fans of data quality, and having an automated error detection tool in your analytics will keep the small ones from growing to tornado-sized proportions. Sure, your data is complex, but it ain’t the weather. Take the time to scrub your data of errors to make the most out of your data intelligence.

Want to learn more about our data quality monitoring and other features?