Loading Now
×

With ‘GitHub for data,’ Gable.ai wants to connect software engineers and ML developers

With 'GitHub for data,' Gable.ai wants to connect software engineers and ML developers

With ‘GitHub for data,’ Gable.ai wants to connect software engineers and ML developers


Head over to our on-demand library to view sessions from VB Transform 2023. Register Here


AI applications are booming. But to keep them from breaking, the data flowing into those apps needs to be high-quality — that is, reliable, complete and accurate.

That’s the problem Gable.ai is poised to solve as the Seattle-based startup launches out of stealth today with $7 million in seed funding. It calls its offering the first data collaboration platform that allows software and data/ML developers to iteratively, build and manage high-quality data assets, but investors have taken to calling it “GitHub for data” — one that other data companies like Kaggle and Hex are investing in.

“GitHub is actually affecting culture — it’s helping software engineers from all around the company communicate with each other much more effectively,” said Chad Sanderson, CEO and co-founder of Gable.ai. “But that doesn’t exist for data at all.”

Gable.ai’s platform allows data producers and data consumers to work together, he told VentureBeat. It helps software and data developers prevent breaking changes to critical data workflows within their existing data infrastructure. The platform features data asset recognition by connecting data sources; data contract creation to establish data asset owners and set meaningful constraints; and data contract enforcement via continuous integration/continuous deployment within GitHub.

Event

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

 


Register Now

Founders led data department at Convoy

Before founding Gable.ai, Sanderson and his co-founders, Adrian Kreuziger and Daniel Dicker, led the data department at Convoy, the $4 billion digital freight network that move thousands of truckloads around the country each day through an optimized, connected network of carriers. Complex data came in fast and furiously, about shipments, shippers, facilities, carriers, trucks, contracts and prices.

While the company had the modern data stack, using the latest and greatest technologies, no one had any trust in the data — there were constant data quality issues, outages for valuable models, and billions of rows of data could not be used.

“When our data science team and the analytics team were trying to understand even simple questions like ‘How many shipments did we do over the past 30 days?’, all of that complexity made it almost impossible to answer that question,” Sanderson said. “And it was the same problem in machine learning — the models were very, very sensitive and the data scientist needed to figure out exactly what data from this very complex system needed to go into that model. When the data quality was wrong, when something suddenly changed, all these sensitive models started to break down, and all the predictions that they made turned out to be wrong.”

Ultimately, he explained, the problem was the communication gap between software engineers and ML developers. “Once we helped bridge that gap, we saw the improvement of data quality exponentially almost immediately,” he said.

In order to scale AI, solving communication problems around changes to data is essential, Sanderson emphasized.

“If you don’t have a change management system for your data, you will not be able to scale AI — you just can’t,” he explained. “The way the Googles and Metas and Amazons solved this problem is throwing bodies at the problem. When a new machine learning model is shipped, there need to be two, three, four data engineers in the room.” But at a company like Convoy, he explained, “we didn’t have the ability to do that. Our data engineering team was six people.”

A new part of the data stack

Gable.ai’s data contracts are an entirely new category Gable.ai has been able to establish as an emerging data primitive — that is, a basic data type. In the last few months, Sanderson has built the “Data Quality Camp,” a Slack community of 8,000+ engaged data practitioners around these new concepts.

These concepts are meant to mark a significant step towards reshaping the data landscape, becoming a new part of a company’s data stack, said Apoorva Pandhi, managing director at Zetta Venture Partners, which led the funding round.

“All the founders of successful data companies, whether it’s dbt Labs, Monte Carlo, Hex, Kaggle, Hightouch, Great Expectations, they’ve all invested in the company and endorsed the fact that this is an integral part of the data stack,” he said.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.



Source link