up

An Introduction to Data Science

By now, we hope you know that MerchantWords derives its search estimates through a combination of Amazon’s information and data science. What is data science, though?

It may sound like one of those super-cool, slightly sci-fi things; like self-driving cars. But to us, it’s one of our most important tools. In fact, data science is so much a part of what we do at MerchantWords that we thought we’d give you a peek behind the curtain.

Data science came about with the rise of technology. As more people used computers, especially with the advent of smartphones, information began accumulating at an astounding rate. According to Eric Schmidt, the former Google CEO, “every two days now we create as much information as we did from the dawn of civilization up until 2003. That’s something like five exabytes of data.”

That statistic is from 2010. Can you imagine how much more data we produce now? According to a study by research firm Dscout, the average person makes more than 2,500 clicks just on their smartphone each day. Each click is generating data. Think about that the next time you like away on Instagram!

Most people agree that the first organizations to look at this growing pile of data and say, “Hey, this could be useful!” were Google, Facebook, and LinkedIn. (We never would have guessed.) This was the first attempt at pulling together a team to make sense of the vast amounts of information that was being collected in cyberspace. In turn, those teams have created much value for companies and their customers.

Components of Data Science

Data Science Diagram.jpg

Math and Statistics: In data science, statistics are used to ask questions about the world and then data is used to find the answers. Data is observed and mathematics help detects different shapes and trends within the data set.

Data scientists take advantage of the fact that many events can be predicted by probability theory. Let’s break it down with a simple example:

You’re walking around your hometown, and you look around and say to yourself, “I think most people here are 5’5”.” The next step is to go out and collect information. You take an entire day walking around with a measuring tape recording the height of everyone you see. That night, you go home and pull together all the data you collected. In looking at the data, you’ll either be right or wrong. In this case, most people in your hometown are 5’5,” and you were correct. Then you calculate the probability that this is the average height just for your town, but not for others. The next step is important: you have to ask more questions. Is the average height just in my town, or does it also apply to the next town? What about the city? The state? The country?

Data scientists do this on a significantly larger scale and with more numbers than you could collect in a single day. They analyze this data over and over until they finally get an accurate picture and ascertain the information or answer they need. There are lots of different ways to understand the world; data scientists happen to do it through numbers and probability. These math muscles help them tackle the next part of our Venn-diagram: hacking skills.

Hacking Skills: First, you should know the difference between programming and hacking. Programming is the act of talking to a computer in a language that it understands. Hacking takes programming to a different level. Although the word gets a bad rap from TV shows and movies, it isn’t about breaking into someone’s computer and stealing their information. Hacking is the ability to creatively use your computer to figure out an innovative solution to a complex problem. Think about applying what you do in Excel to large swaths of information, putting that on super-steroids, and you’ve almost got hacking. Or at least, the concept.

A data scientist uses computer programming skills and math expertise to develop algorithms that can quickly and efficiently scour, clean, and analyze data on a massive scale and, hopefully, provide answers and insights that can’t be ascertained by human calculations alone. This detailed process requires constant analysis, fine-tuning the algorithm, and then checking back to see what it found on the second, third and fourth try. Each time you get a different window into your data set that can inform the next question you ask.

Substantive Expertise: While data scientists understand all this information, most of us don’t. An essential part of collecting all that information is distilling it, so the insights are conveyed to others. Let’s go back to our height example. If you gathered all this data for a company building houses in the area but didn’t know how this information could help the company or couldn’t effectively communicate your findings, what would be the point of collecting and analyzing that data? A data scientist knows not only what the information is saying, but also why it’s useful to the company.

We’re in the business of buyer keywords and search estimates so we’ll use that as an example. If you see that customer searches for inner tubes have spiked from May through August for the past three years, you can conclude that inner tubes are a summer product. This trend can be useful to people in product development, sales, and marketing as they build company strategies. Data scientists need the business acumen to notice when a pattern can be helpful to their company and what data points can forecast future trends.

When you incorporate the three skills of math and stats, hacking and substantive expertise, you get data science. By combining the knowledge from all of these subject areas, data scientists garner unique insights they couldn't gain otherwise.

Having data scientists on our team at MerchantWords allows us to not only provide you with the most extensive, most accurate database of keyword search results, but also to help you see trends that affect sales, ad campaigns, and new product launches.