By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Customer storiesPricingAboutCareersGet started
href="javascript:history.back(1);" class="w-button">Back Button
← Back
June 22, 2024

How to establish a data strategy for your fund

Gopi Sundaramurthy
Co-founder, Partner, and Head of Data Science at Ensemble VC

Gopi is the co-founder, Partner, and Head of Data Science at Ensemble VC. Prior to Ensemble, he was the Data Science Lead at Kauffman Fellows (under the umbrella of the Kauffman Fund) and a Data Scientist at Watson Health Consulting (part of IBM Watson Health). Ensemble’s thesis is to back great teams, regardless of the vertical. The model Gopi and his team have built uncovers those teams.

When we started Ensemble, I was very clear that data would be the core of our strategy. We made an early decision that we weren't going to be a software firm; we were going to be a data firm. Right now, our team is uniquely a 50/50 split: We have as many data engineers as we have investors. There aren’t many firms that have that breakdown. But this was our bet, and it has paid off. Here’s how Ensemble has approached scaling their data strategy:

On list sorting and scoring models

I always tell people, my job is making and managing lists. Every month we create a new list and figure out how to prioritize it. What companies and people are on the list is one thing, and a lot of data-driven firms stop there. But the sorting of the list is as important as what’s on it. On any given day, an investor only has so much time, so the few companies they’re going to focus on that day should be the absolute best-fit companies based on our thesis, our ability to connect with the team, and so on. Early on, we’d send these lists out to the team, and the fatigue would set in immediately. An investor might make an arbitrary cutoff at row 25, knowing their time was limited. But what if the company on row 26 was the best company we could invest in? We needed to be more data-driven about the next step.

So we built a model that outputs rankings much the same way Harmonic’s Relevance Score does. We gave ourselves a KPI after performing a calendar audit. The biggest challenge investors were facing with lists was they were spending too much time talking to people to discern which deals they should be on. Let’s say a VC talks to 100 companies and ultimately makes one investment. In any other field, that’s not a great ratio. One hundred 30-minute calls mean 50 hours on the phone to make one decision—and that doesn’t include diligence or any of the critical analyses that humans are so good at. That’s just time spent qualifying things. So we asked: Can we bring that number down? And our KPI became: Let’s get that ratio down to 10-to-1. We’re closer to 5-to-1 right now.

A VC might come in saying: A good friend in my network said this deal is amazing. We don’t discredit that deal just because we’ve already pulled a list of what the data says are the most interesting companies. The company just becomes another data point. The goal is to: absorb that deal into our list and rank it appropriately.

List prioritization ensures two key things for our teams, 1) we tend to be more outbound-focused than traditional VC firms and 2) we can plan our investment process weeks in advance, thereby making efficient use of partners’ time. We also have to account for the fact that those lists are dynamically changing based on what investors are bringing in.

Companies we choose to invest in and companies we choose not invest in are equally valuable data points, because they go back into the model and inform it. By default, nobody in our organization has permission to delete a record; they can only sunset it. Every deal that’s sunsetted goes into a base, and we’ll do a quarterly retrospective on those bases to figure out why so many deals are getting sunsetted in a given space. We can aggregate a lot more data there to figure out what features, metadata, and so on we need to get better at those primary lists.

On sprint cycles

I see my team as a startup within a VC firm. We have a client of one, which is our investment team. It’s our job to build products for them, and we want to be innovators in the space, so we’re constantly testing tools and building on our algorithms, which are as sophisticated as those in some of our portfolio companies. The startup ethos also plays out in the way we run the investment team on a sprint cycle. Early on, we noticed inconsistency around how much time it would take to go through a list. Sometimes it would take two weeks, other times it would take two months. So we revised our cadence. Our model now identifies 500-700 deals a month—the top 2-to-3 percentile—and moves them into a system so the entire investment team is looking at the same universe. Of course, they can submit new deals that automatically get merged into the list; but the point is to move from discovery to quarterbacking. The latter allows investors to be much more efficient, because they know the exact steps that need to be taken.

The sprint looks like this: During week one, investors validate all the leads they think are interesting—which company is likely to be raising, when, and why. They get in touch with those companies; they get back-channel references; they work that list of 700 down to something more manageable. Weeks two, three, and four we literally just do stand ups: Okay, we spoke about this deal last month. We asked for information from this person; did we get feedback? Can we figure out how to get a warm introduction to the company? Oh, we're not getting an introduction because they’re not raising? Okay, let's prioritize it for next month. At the end of the month, we pull another list, and the next sprint cycle begins. The process is now way more manageable—and much more repeatable—because it's made up of linear parts. And our data has benefited from it, since every investor is going through the same list and giving us immediate feedback on what’s working and what’s not. 

On process

Getting investors’ buy-in for process is just as important as having a sophisticated model. When I first started building lists, I’d share them out and VCs simply wouldn’t use them. They’d say: Nice list, but I have 3 deals from people in network that I'm going to chase instead. Getting the firm to change behavior was almost as challenging as assembling the data. I realized that VCs are always going to find deals in network, and that's great. But I needed to build a way for them to submit those deals so I could rank them into their list. I wanted them to know that what they were submitting could be worked on—but in addition to these exciting opportunities the data found.

So maybe an investor told us that deal X was interesting, but we were telling the investor that out of five deals, deal Y was the most interesting. I wasn’t going to argue. I’d say: Go and talk to both companies and tell me what was interesting or not. Give us feedback. That feedback would go into the model, and by the second or third cycle, the investor would say: You told me these two companies are the most interesting ones, and they are. The point is to work with their process but to reinforce it with data. You have to combine innovations in data with innovations in process to change behavior and leverage the data well. Otherwise, data innovation just happens in a vacuum. Showing the investor team that they could continue to do what they were doing and more through this new process ultimately offered the firm tremendous value. 

“Getting the firm to change behavior was almost as challenging as assembling the data. I realized that VCs are always going to find deals in network, and that's great. But I needed to build a way for them to submit those deals so I could rank them into their list.”

On the limitations of data

I spend my life building models, and what the model is great at is saying: Here’s a list of 50 compelling companies. What humans are great at is understanding the value of what’s being built at each of those companies. We use data for sourcing and screening; but when it comes to evaluating companies, the usefulness of the data has run its course. There will be strong disqualifiers that a model can never predict: Was the founder unprofessional? Is the customer reference subpar? No amount of data will prepare you for that. Investors make judgment calls based on hunches and trends and the perceptions they form of founders and their customers. Those are critical human evaluation points. They always will be for VC.

Gopi Sundaramurthy
Co-founder, Partner, and Head of Data Science at Ensemble VC
← Previous Article
There is no previous article
There is no next article
Next Article →