How Does ZoomInfo Get Data? Algorithms Explained | The Pipeline

Dan Shewan

Senior Content Manager

From Google search results to stock market trading, algorithms have reshaped virtually every aspect of society.

Yet despite their ubiquity, algorithms remain misunderstood by many — even by people whose jobs rely heavily on algorithms and related technologies, such as machine learning.

As a global go-to-market platform, ZoomInfo invests significant time, effort, and resources into developing sophisticated algorithms that offer our customers more accurate data and better solutions. But how exactly do our algorithms work, and what do we use them for?

Algorithms 101

At its simplest, an algorithm is a set of instructions that tells a computer how certain actions should be handled to solve a specific problem. The results of solving that problem can be provided to an end-user, such as the results page for a person using a search engine, or the input for further calculations to solve more complex problems.

The concept is often illustrated by comparing algorithms to recipes. Although simple algorithms can be described as a series of instructions, most algorithms use if-then conditional logic — if a specific condition is met, then the program should respond accordingly.

Take a routine action such as crossing the street. To the human mind, this action is so common we barely give it any real thought, beyond the obvious question of whether it’s safe to cross. A computer could evaluate if it’s safe to cross the street, but it has to be told how to do so. This is where algorithms come in.

The many factors that go into crossing the street represent individual data points a computer needs to process to arrive at the desired output:

What type of street are you crossing? How many lanes of traffic are there?
Is there a crosswalk? Will you cross at a crosswalk or not?
If you’re using a crosswalk, will you wait for the “walk” signal, or cross when there are no cars coming?
How many cars typically drive down that street? How fast do they tend to move?
What time of day is it? Does this affect how many cars are on the street?
Are you the only pedestrian crossing the street? Are there multiple people crossing the street?

Since computers only “know” what we program them to know, even the simplest actions can quickly become more complicated than they might appear.

Conditional logic can complicate algorithms even further. In our example of crossing the street, conditional logic might dictate that if there are five seconds or less remaining on the crosswalk’s walk signal, then we should not attempt to cross the street, and wait for the light to change again.

This complexity, however, allows the machine-learning technologies used in “thinking” computers to learn over time as they evaluate new data and solve increasingly complex problems.

The Importance of Quality Data

Algorithms can be compared to recipes, but even master chefs can’t prepare delicious meals with poor ingredients. Similarly, it doesn’t matter how sophisticated an algorithm may be if the underlying data is inaccurate or incomplete.

Amit Rai, vice president in charge of enterprise product and sales at ZoomInfo, says that solving the problem of inaccurate, incomplete B2B data simply hasn’t been a priority for most companies.

“Go back in time to the 1970s,” Rai says. “In the B2B world, there was no one organizing the world’s business information. The gathering method was calling businesses and self-reported surveys. Because this method remains prevalent, your match rates are poor. You don’t have good coverage for smaller businesses, because smaller businesses aren’t calling you and telling you who they are, their annual revenue, and their industry. You are relying on someone to tell you what their industry classification is.”

ZoomInfo’s algorithms and machine-learning technologies are solving this problem of inaccurate, incomplete B2B data. By training machine-learning models to recognize specific terms and phrases, algorithms can begin to correctly classify businesses that would never respond to cold calls or submit self-reported surveys.

However, more data doesn’t always mean better data. That’s why ZoomInfo’s engineers and data scientists train their models to recognize the “Super Six” attributes — name, website, revenue, employees, location, and industry — to start building current, more complete profiles of even the smallest businesses.

“These Super Six attributes are so important because, regardless of whether a business has a big web presence or a large digital footprint, these are the core attributes that they’ll have in some shape or form,” Rai says.

Inaccurate data doesn’t just create problems in terms of how it can be used. It also creates a problem of trust in data vendors. Many companies have been burned by legacy data vendors selling expensive, incomplete datasets that are of little use to sales and marketing teams.

Putting the Puzzle Together

Rai was previously chief operating officer for a company called EverString, which ZoomInfo acquired in November 2020.

EverString built a company-graphing data product that mapped out the complex relationships between businesses, with an emphasis on very small businesses that often have the least available data. Initially, the company set out to become the leading player in the emerging field of predictive marketing — using machine-learning models to anticipate the behavior of commercial entities.

However, it soon became clear that the nascent field of predictive marketing was unlikely to mature. The problem wasn’t the lack of data — far from it — but rather the quality of the B2B data available. Most legacy data vendors were sourcing B2B data from older datasets, such as credit reports, risk analyses, and legal compliance data. Important firmographic data, such as employee count, was often inaccurate or missing altogether.

“What we found was that many of these data vendors had been in the industry forever,” Rai says. “Other data vendors were resellers of the exact same data. Even though you think, as a buyer, you’re purchasing data from multiple data vendors, you’re purchasing the exact same data.”

Rai soon realized that data from legacy vendors often lacked the core Super Six attributes that are fundamental to high match rates and superior data fidelity.

When working with datasets from legacy data vendors for companies with up to 20 employees, the Super Six attribute match rate of those datasets was just 10 percent, so low as to be virtually unusable. This represented an enormous opportunity — which is where advanced algorithms truly shined. The entity resolution (or matching) algorithms developed by the team were so sophisticated, they were able to construct highly granular profiles of SMBs that, in some cases, were so small they lacked even their own website.

By focusing primarily on the Super Six attributes, Rai and his team were able to achieve a near 100 percent fill rate on firmographic data fields. Combined with ZoomInfo’s vast datasets, their results were phenomenal.

“Suddenly, we were able to fill in information about those Super Six attributes for every record,” Rai says. “Clients were able to join those other data attributes with the Super Six. Suddenly, their models started performing 300 percent better than they had before, and that resulted in billions of dollars in additional revenue.”

Technical Expertise and Human Insight, Working Together

One of the biggest challenges faced by ZoomInfo’s data scientists and engineers is training machine-learning models to solve problems that would be simple for us.

While we may find it easy to infer the name of a company based on the information on its website, training a machine-learning model to do the same is much harder. This challenge becomes even more difficult when working with multiple data points — even just the core Super Six attributes — because training AI models to recognize and infer a company’s name is an entirely different process than training it to estimate a company’s annual revenue.

“There are two types of data attributes,” Rai says. “The first is deterministic attributes: the name of a company, its industry, its address. Then there are non-deterministic attributes, such as the revenue of a company. If a company is private, you cannot verify revenue figures, so you have to start predicting, making educated guesses. These estimates are fed as training examples to machine-learning models by humans because humans are good at estimates. And then we let the machine train and say, `Now can you predict?’ So the machine starts predicting.”

The principle of combining algorithms and machine-learning technologies with human expertise is central to ZoomInfo’s approach to data. Algorithms and machine-learning handle the computational heavy lifting, while data scientists and expert researchers ensure that the data is accurate. This virtuous cycle results in higher data fidelity and superior results for ZoomInfo customers.

ZoomInfo is constantly investing in these technologies to ensure that customers have the most accurate data possible for their go-to-market motions at every stage of the customer lifecycle. For Rai, the potential for better, more sophisticated data services is virtually limitless, and likely to keep him busy for the foreseeable future.

“If you think about Salesforce, what that company did was democratize CRM on the cloud,” Rai says. “It was the first true SaaS company. It’s now ZoomInfo’s time. We’re building the next-generation, modern go-to-market platform for sales professionals, where you don’t have to leave the ZoomInfo ecosystem. That’s something that keeps me excited.”