Data diversifying
Houston data startup plans to expand its technology from oil and gas to include health care and defense industries
There are about 40,000 sensors on an offshore drilling rig, and each collects information about how the rig's many machines are operating. But sensors can fail or be miscalibrated and, in the relay between the rig and data scientists, data can pick up errors. The scientists will first have to clean and validate the information to ensure its credible.
That's where Pandata Tech comes in. The Houston-based company can run a data quality check for its oil and gas clients. But Gustavo Sanchez, co-founder and CEO of the company, is speeding up that process by automating it. Pandata Tech uses machine learning to review data generated by drilling rigs — and the algorithms determine how likely that data can be trusted. And for Houston's 175,000 residents employed in oil and gas, that data needs to be trustworthy.
"If the data's bad, then you're going to have a lot of bad decisions," Sanchez says.
The legacy of machine learning began in the 1950s, when computer scientist Arthur Samuel wrote a program for a computer to play checkers and improve at the game the more it played. Since then, the complex algorithms written for computers to learn and develop without human intervention have been implemented in industries like finance, sales, surveillance, and more.
Sanchez and his business partner, Jessica Reitmeier, met in China during graduate school. They founded the company after a stint for Sanchez in finance data science. He realized that small service companies had no control over their equipment operated and put data analytics in the hands of small, independent service contractors. So they developed Pandata Tech in January 2016, and today their core team has three people who manage data science, marketing and operations, and staff development.
Pandata Tech's software reduces the amount of time data scientists have to spend validating their data — from 80 percent of their time down to just 20 percent, Sanchez says. It works by using models to generate a data quality score.
For example, a sensor that monitors pressure levels is paired with a computer model of what those levels should be — and the software checks for missing or incorrect data, then uses statistics to determine how likely that the sensors are picking up correct data. It creates a quality score for that data between 0 and 100 in the short and long term; if it compiles the data for a 24-hour window, then the score should be close to 100, but the software can also analyze data for 90-day streaks. In this case, the ideal might be above 60. It's a lot like a credit check, Sanchez says.
And while Pandata Tech began in the energy industry, the team is now expanding to fields like defense and healthcare, which also generate hundreds of thousands of data points that need it be checked. The unique challenges of working with large drilling rigs have translated well to working with aircrafts. And the healthcare field is similar — with the Texas Medical Center, Houston's medical research centers can benefit from hastening the process of data validation.
"There's so much data, and it's so noisy, that it's hard to know whether the data can be trusted or not," Sanchez says.
Pandata Tech is focusing on its current revenue sources in these three fields. Recently, they closed on a deal with one of the largest offshore drilling companies in the world, and Sanchez hopes to double his team size within the year. But he's staying cautious — and the move to healthcare and defense industries is not just a move to expand the use of his company's technology. It's also a way of reducing risk, by not investing in just one industry.
"It's hard to sell scale to a startup," Sanchez says. "We've gotta reduce our risk so we can continue to grow."