Let's talk about dark data — what it means and how to navigate it. Graphic byMiguel Tovar/University of Houston

Is it necessary to share ALL your data? Is transparency a good thing or does it make researchers “vulnerable,” as author Nathan Schneider suggests in the Chronicle of Higher Education article, “Why Researchers Shouldn’t Share All Their Data.”

Dark Data Defined

Dark data is defined as the universe of information an organization collects, processes and stores – oftentimes for compliance reasons. Dark data never makes it to the official publication part of the project. According to the Gartner Glossary, “storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

This topic is reminiscent of the file drawer effect, a phenomenon which reflects the influence of the results of a study on whether or not the study is published. Negative results can be just as important as hypotheses that are proven.

Publication bias and the need to only publish positive research that supports the PI’s hypothesis, it can be argued, is not good science. According to an article in the Indian Journal of Anaesthesia, authors Priscilla Joys Nagarajan, et al., wrote: “It is speculated that every significant result in the published world has 19 non-significant counterparts in file drawers.” That’s one definition of dark data.

Total Transparency

But what to do with all your excess information that did not make it to publication, most likely because of various constraints? Should everything, meaning every little tidbit, be readily available to the research community?

Schneider doesn’t think it should be. In his article, he writes that he hides some findings in a paper notebook or behind a password, and he keeps interviews and transcripts offline altogether to protect his sources.

Open-source

Open-source software communities tend to regard total transparency as inherently good. What are the advantages of total transparency? You may make connections between projects that you wouldn’t have otherwise. You can easily reproduce a peer’s experiment. You can even become more meticulous in your note-taking and experimental methods since you know it’s not private information. Similarly, journalists will recognize this thought pattern as the recent, popular call to engage in “open journalism.” Essentially, an author’s entire writing and editing process can be recorded, step by step.

TMI

This trend has led researchers to open-source programs like Jupyter and GitHub. Open-source programs detail every change that occurs along a project’s timeline. Is unorganized, excessive amounts of unpublishable data really what transparency means? Or does it confuse those looking for meaningful research that is meticulously curated?

The Big Idea

And what about the “vulnerability” claim? Sharing every edit and every new direction taken opens a scientist up to scoffers and harassment, even. Dark data in industry even involves publishing salaries, which can feel unfair to underrepresented, marginalized populations.

In Model View Culture, Ellen Marie Dash wrote: “Let’s give safety and consent the absolute highest priority, with openness and transparency prioritized explicitly below those. This means digging deep, properly articulating in detail what problems you are trying to solve with openness and transparency, and handling them individually or in smaller groups.”

------

This article originally appeared on the University of Houston's The Big Idea. Sarah Hill, the author of this piece, is the communications manager for the UH Division of Research.

Ad Placement 300x100
Ad Placement 300x600

CultureMap Emails are Awesome

Houston-based equitable entrepreneurship tech platform expands programs

coming soon

Fresh off of celebrating the dismissal of a lawsuit from former Trump Administration officials, Hello Alice is expanding some of its offerings for entrepreneurs.

In partnership with top organizations — like Progressive, Antares Capital, Wells Fargo, and FedEx — Hello Alice has added new offerings for its 2024 Boost Camp programs, a mix of skill-building support and grant opportunities.

“We are fortunate to continue working with great enterprise partners who share our commitment to supporting Main Street through crucial grants and mentorship programs,” Carolyn Rodz, CEO and co-founder of Hello Alice, says in a news release. “Small businesses drive our economy, yet often lack the necessary financing and resources. By partnering with major companies, Hello Alice is ensuring that small businesses have access to the tools and opportunities they need to thrive and create jobs in their local communities. Together, we are building a robust support system that fosters innovation and growth for small businesses across the country.”

This year's programs, according to Hello Alice, are as follows:

  • Antares Capital REACH Cohort: The Antares REACH Grant Program provides $20,000 grants to small businesses. Grant recipients will also take part in Antares’ Growth Track Boost Camp program, a digital community which will be home to monthly business coaching workshops, mentorship, networking, and more. Applications are open until June 28, and the program begins August 8.
  • Progressive Driving Small Business Forward Grant & Booster Camp Program: Progressive is dedicating $1 million to award 20 deserving businesses with a $50,000 grant each. Grant recipients will be invited to attend an exclusive 12-week virtual Boost Camp coaching program. Applications have closed for the program beginning September 10.
  • Wells Fargo: Wells Fargo is supporting four virtual accelerator programs over the next 18 months, designed to support up to 500 participants for each program, with a focus on business health and credit-building practices. Applications will be announced this summer for the program, which will begin in early fall.
  • FedEx: The FedEx Entrepreneur Fund supports entrepreneurs in the United States by providing them with the necessary funding, resources, and networks to enhance the success of their businesses, including the Boost Camp coaching program.
  • Applications will be announced this fall for the program, which will begin in the winter.

More information and application access is available online.

Last year's Boost programs benefitted 100 small businesses, according to Hello Alice, which reported that the 2023 Antares REACH Cohort resulted in 60 percent of participants seeing an increase in their Business Health Score and 93 percent felt better equipped to confront challenges and capitalize on opportunities. In the end, 85 percent of participants feeling more optimistic about their business growth prospects.

"Hello Alice is proud to partner with high-level enterprise companies to empower small businesses and foster their success," Natalie Diamond, vice president of business development at Hello Alice, adds. "Together, we are creating unparalleled opportunities for entrepreneurs to achieve brand success, drive financial fitness, and thrive in today's competitive market. Our joint endeavors not only offer access to capital and resources but also provide tailored guidance and mentorship, arming small business owners with the insights and support necessary to navigate challenges and seize growth opportunities.”

Houston company's sustainable oil product reaches milestone production capacity 5 years early

overachieving

Houston-based biotech company Cemvita has achieved a key production goal five years ahead of schedule.

Thanks to technology advancements, Cemvita is now capable of generating 500 barrels per day of sustainable oil from carbon waste at its first commercial plant. As a result, Cemvita has quadrupled output at the Houston plant. The company had planned to reach this milestone in 2029.

Cemvita, founded in 2017, says this achievement paves the way for increased production capacity, improved operational efficiency, and an elevated advantage in the sustainable oil market.

“What’s so amazing about synthetic biology is that humans are just scratching the surface of what’s possible,” says Moji Karimi, co-founder and CEO of Cemvita. “Our focus on the first principles has allowed us to design and create new biotech more cheaply and faster than ever before.”

The production achievement follows Cemvita’s recent breakthrough in development of a solvent-free extraction bioprocess.

In 2023, United Airlines agreed to buy up to one billion gallons of sustainable aviation fuel from Cemvita’s first full-scale plant over the course of 20 years.

Cemvita’s investors include the UAV Sustainable Flight Fund, an investment arm of Chicago-based United; Oxy Low Carbon Ventures, an investment arm of Houston-based energy company Occidental Petroleum; and Japanese equipment and machinery manufacturer Mitsubishi Heavy Industries.

Tech disruptions sparked by Texas co.'s update highlight the fragility of globally connected technology

Airlines, banks, hospitals and other risk-averse organizations around the world chose cybersecurity company CrowdStrike to protect their computer systems from hackers and data breaches.

But all it took was one faulty CrowdStrike software update to cause global disruptions Friday that grounded flights, knocked banks and media outlets offline, and disrupted hospitals, retailers and other services.

“This is a function of the very homogenous technology that goes into the backbone of all of our IT infrastructure,” said Gregory Falco, an assistant professor of engineering at Cornell University. “What really causes this mess is that we rely on very few companies, and everybody uses the same folks, so everyone goes down at the same time.”

The trouble with the update issued by CrowdStrike and affecting computers running Microsoft's Windows operating system was not a hacking incident or cyberattack, according to CrowdStrike, which apologized and said a fix was on the way.

But it wasn't an easy fix. It required “boots on the ground” to remediate, said Gartner analyst Eric Grenier.

“The fix is working, it’s just a very manual process and there’s no magic key to unlock it,” Grenier said. “I think that is probably what companies are struggling with the most here.”

While not everyone is a client of CrowdStrike and its platform known as Falcon, it is one of the leading cybersecurity providers, particularly in transportation, healthcare, banking and other sectors that have a lot at stake in keeping their computer systems working.

“They’re usually risk-averse organizations that don’t want something that’s crazy innovative, but that can work and also cover their butts when something goes wrong. That’s what CrowdStrike is,” Falco said. “And they’re looking around at their colleagues in other sectors and saying, ‘Oh, you know, this company also uses that, so I’m gonna need them, too.’”

Worrying about the fragility of a globally connected technology ecosystem is nothing new. It's what drove fears in the 1990s of a technical glitch that could cause chaos at the turn of the millennium.

“This is basically what we were all worried about with Y2K, except it’s actually happened this time,” wrote Australian cybersecurity consultant Troy Hunt on the social platform X.

Across the world Friday, affected computers were showing the “blue screen of death” — a sign that something went wrong with Microsoft's Windows operating system.

But what's different now is “that these companies are even more entrenched,” Falco said. "We like to think that we have a lot of players available. But at the end of the day, the biggest companies use all the same stuff.”

Founded in 2011 and publicly traded since 2019, CrowdStrike describes itself in its annual report to financial regulators as having “reinvented cybersecurity for the cloud era and transformed the way cybersecurity is delivered and experienced by customers.” It emphasizes its use of artificial intelligence in helping to keep pace with adversaries. It reported having 29,000 subscribing customers at the start of the year.

The Austin, Texas-based firm is one of the more visible cybersecurity companies in the world and spends heavily on marketing, including Super Bowl ads. At cybersecurity conferences, it's known for large booths displaying massive action-figure statues representing different state-sponsored hacking groups that CrowdStrike technology promises to defend against.

CrowdStrike CEO George Kurtz is among the most highly compensated in the world, recording more than $230 million in total compensation in the last three years. Kurtz is also a driver for a CrowdStrike-sponsored car racing team.

After his initial statement about the problem was criticized for lack of contrition, Kurtz apologized in a later social media post Friday and on NBC's “Today Show.”

“We understand the gravity of the situation and are deeply sorry for the inconvenience and disruption,” he said on X.

Richard Stiennon, a cybersecurity industry analyst, said this was a historic mistake by CrowdStrike.

“This is easily the worst faux pas, technical faux pas or glitch of any security software provider ever,” said Stiennon, who has tracked the cybersecurity industry for 24 years.

While the problem is an easy technical fix, he said, it’s impact could be long-lasting for some organizations because of the hands-on work needed to fix each affected computer. “It’s really, really difficult to touch millions of machines. And people are on vacation right now, so, you know, the CEO will be coming back from his trip to the Bahamas in a couple of weeks and he won’t be able to use his computers.”

Stiennon said he did not think the outage revealed a bigger problem with the cybersecurity industry or CrowdStrike as a company.

“The markets are going to forgive them, the customers are going to forgive them, and this will blow over,” he said.

Forrester analyst Allie Mellen credited CrowdStrike for clearly telling customers what they need to do to fix the problem. But to restore trust, she said there will need to be a deeper look at what occurred and what changes can be made to prevent it from happening again.

“A lot of this is likely to come down to the testing and software development process and the work that they’ve put into testing these kinds of updates before deployment,” Mellen said. “But until we see the complete retrospective, we won’t know for sure what the failure was.”