Let's talk about dark data — what it means and how to navigate it. Graphic by Miguel Tovar/University of Houston

Is it necessary to share ALL your data? Is transparency a good thing or does it make researchers “vulnerable,” as author Nathan Schneider suggests in the Chronicle of Higher Education article, “Why Researchers Shouldn’t Share All Their Data.”

Dark Data Defined

Dark data is defined as the universe of information an organization collects, processes and stores – oftentimes for compliance reasons. Dark data never makes it to the official publication part of the project. According to the Gartner Glossary, “storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

This topic is reminiscent of the file drawer effect, a phenomenon which reflects the influence of the results of a study on whether or not the study is published. Negative results can be just as important as hypotheses that are proven.

Publication bias and the need to only publish positive research that supports the PI’s hypothesis, it can be argued, is not good science. According to an article in the Indian Journal of Anaesthesia, authors Priscilla Joys Nagarajan, et al., wrote: “It is speculated that every significant result in the published world has 19 non-significant counterparts in file drawers.” That’s one definition of dark data.

Total Transparency

But what to do with all your excess information that did not make it to publication, most likely because of various constraints? Should everything, meaning every little tidbit, be readily available to the research community?

Schneider doesn’t think it should be. In his article, he writes that he hides some findings in a paper notebook or behind a password, and he keeps interviews and transcripts offline altogether to protect his sources.

Open-source

Open-source software communities tend to regard total transparency as inherently good. What are the advantages of total transparency? You may make connections between projects that you wouldn’t have otherwise. You can easily reproduce a peer’s experiment. You can even become more meticulous in your note-taking and experimental methods since you know it’s not private information. Similarly, journalists will recognize this thought pattern as the recent, popular call to engage in “open journalism.” Essentially, an author’s entire writing and editing process can be recorded, step by step.

TMI

This trend has led researchers to open-source programs like Jupyter and GitHub. Open-source programs detail every change that occurs along a project’s timeline. Is unorganized, excessive amounts of unpublishable data really what transparency means? Or does it confuse those looking for meaningful research that is meticulously curated?

The Big Idea

And what about the “vulnerability” claim? Sharing every edit and every new direction taken opens a scientist up to scoffers and harassment, even. Dark data in industry even involves publishing salaries, which can feel unfair to underrepresented, marginalized populations.

In Model View Culture, Ellen Marie Dash wrote: “Let’s give safety and consent the absolute highest priority, with openness and transparency prioritized explicitly below those. This means digging deep, properly articulating in detail what problems you are trying to solve with openness and transparency, and handling them individually or in smaller groups.”

------

This article originally appeared on the University of Houston's The Big Idea. Sarah Hill, the author of this piece, is the communications manager for the UH Division of Research.

"Better and personalized healthcare through AI is still a hugely challenging problem that will take an army of scientists and engineers." Photo via UH.edu

Houston expert explains health care's inequity problem

guest column

We are currently in the midst of what some have called the "wild west" of AI. Though healthcare is one of the most heavily regulated sectors, the regulation of AI in this space is still in its infancy. The rules are being written as we speak. We are playing catch-up by learning how to reap the benefits these technologies offer while minimizing any potential harms once they've already been deployed.

AI systems in healthcare exacerbate existing inequities. We've seen this play out into real-world consequences from racial bias in the American justice system and credit scoring, to gender bias in resume screening applications. Programs that are designed to bring machine "objectivity" and ease to our systems end up reproducing and upholding biases with no means of accountability.

The algorithm itself is seldom the problem. It is often the data used to program the technology that merits concern. But this is about far more than ethics and fairness. Building AI tools that take account of the whole picture of healthcare is fundamental to creating solutions that work.

The Algorithm is Only as Good as the Data

By nature of our own human systems, datasets are almost always partial and rarely ever fair. As Linda Nordling comments in a Nature article, A fairer way forward for AI in healthcare, "this revolution hinges on the data that are available for these tools to learn from, and those data mirror the unequal health system we see today."

Take, for example, the finding that Black people in US emergency rooms are 40 percent less likely to receive pain medication than are white people, and Hispanic patients are 25 percent less likely. Now, imagine the dataset these findings are based on is used to train an algorithm for an AI tool that would be used to help nurses determine if they should administer pain relief medication. These racial disparities would be reproduced and the implicit biases that uphold them would remain unquestioned, and worse, become automated.

We can attempt to improve these biases by removing the data we believe causes the bias in training, but there will still be hidden patterns that correlate with demographic data. An algorithm cannot take in the nuances of the full picture, it can only learn from patterns in the data it is presented with.

Bias Creep

Data bias creeps into healthcare in unexpected ways. Consider the fact that animal models used in laboratories across the world to discover and test new pain medications are almost entirely male. As a result, many medications, including pain medication, are not optimized for females. So, it makes sense that even common pain medications like ibuprofen and naproxen have been proven to be more effective in men than women and that women tend to experience worse side effects from pain medication than men do.

In reality, male rodents aren't perfect test subjects either. Studies have also shown that both female and male rodents' responses to pain levels differ depending on the sex of the human researcher present. The stress response elicited in rodents to the olfactory presence of a sole male researcher is enough to alter their responses to pain.

While this example may seem to be a departure from AI, it is in fact deeply connected — the current treatment choices we have access to were implicitly biased before the treatments ever made it to clinical trials. The challenge of AI equity is not a purely technical problem, but a very human one that begins with the choices that we make as scientists.

Unequal Data Leads to Unequal Benefits

In order for all of society to enjoy the many benefits that AI systems can bring to healthcare, all of society must be equally represented in the data used to train these systems. While this may sound straightforward, it's a tall order to fill.

Data from some populations don't always make it into training datasets. This can happen for a number of reasons. Some data may not be as accessible or it may not even be collected at all due to existing systemic challenges, such as a lack of access to digital technology or simply being deemed unimportant. Predictive models are created by categorizing data in a meaningful way. But because there's generally less of it, "minority" data tends to be an outlier in datasets and is often wiped out as spurious in order to create a cleaner model.

Data source matters because this detail unquestionably affects the outcome and interpretation of healthcare models. In sub-Saharan Africa, young women are diagnosed with breast cancer at a significantly higher rate. This reveals the need for AI tools and healthcare models tailored to this demographic group, as opposed to AI tools used to detect breast cancer that are only trained on mammograms from the Global North. Likewise, a growing body of work suggests that algorithms used to detect skin cancer tend to be less accurate for Black patients because they are trained mostly on images of light-skinned patients. The list goes on.

We are creating tools and systems that have the potential to revolutionize the healthcare sector, but the benefits of these developments will only reach those represented in the data.

So, what can be done?

Part of the challenge in getting bias out of data is that high volume, diverse and representative datasets are not easy to access. Training datasets that are publicly available tend to be extremely narrow, low-volume, and homogenous—they only capture a partial picture of society. At the same time, a wealth of diverse health data is captured every day in many healthcare settings, but data privacy laws make accessing these more voluminous and diverse datasets difficult.

Data protection is of course vital. Big Tech and governments do not have the best track record when it comes to the responsible use of data. However, if transparency, education, and consent for the sharing of medical data was more purposefully regulated, far more diverse and high-volume data sets could contribute to fairer representation across AI systems and result in better, more accurate results for AI-driven healthcare tools.

But data sharing and access is not a complete fix to healthcare's AI problem. Better and personalized healthcare through AI is still a hugely challenging problem that will take an army of scientists and engineers. At the end of the day, we want to teach our algorithms to make good choices but we are still figuring out what good choices should look like for ourselves.

AI presents the opportunity to bring greater personalization to healthcare, but it equally presents the risk of entrenching existing inequalities. We have the opportunity in front of us to take a considered approach to data collection, regulation, and use that will provide a fuller and fairer picture and enable the next steps for AI in healthcare.

------

Angela Wilkins is the executive director of the Ken Kennedy Institute at Rice University.

UH is officially part of an initiative to diversify machine learning research. Photo courtesy of University of Houston

University of Houston joins $50M initiative to expand and diversify AI and machine learning research

money moves

A $50 million grant from the National Institutes of Health is expanding research in machine learning and artificial intelligence, and the University of Houston now has a seat at the table.

UH has joined in on a national initiative to increase the diversity of artificial intelligence researchers, according to a news release from the school. Thanks to a $50 million grant from the National Institutes of Health, the University of North Texas Health Science Center will lead the coordinating center of the AIM-AHEAD program, which stands for Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity.

"Beyond health care, AI has been used in areas from facial recognition to self-driving cars and beyond, but there is an extreme lack of diversity among the developers of AI/ML tools. Many studies have shown that flawed AI systems and algorithms perpetuate gender and racial biases and have resulted in untoward outcomes," says Bettina Beech, chief population health officer at the University of Houston and newly named AIM-AHEAD coordinating center team member.

The initiative will bring together collaborators and experts across AI and machine learning, health equity research, data science training, data infrastructure and more. The other universities involved include: University of Colorado-Anschutz Medical Center in Aurora; University of California, Los Angeles; Meharry Medical College in Nashville; Morehouse School of Medicine in Atlanta; Johns Hopkins University, and Vanderbilt University Medical Center.

"This network will be foundational to achieving the goals of the AIM-AHEAD program, which include providing more inclusive data for health disparities research, and enhancing the diversity of AI/ML leadership," says Susan Gregurick, NIH associate director for data science, in the release.

Unfortunately, AI — designed by humans — mimics human decision making through its choice of algorithms. This means that the same biases humans deal with have made it into the AI decision making too. These gaps can lead to continued disparities and inequities for underrepresented communities especially in regards to health care, job hiring, and more.

"AI solutions need to be implemented in a responsible manner and are now guided by AI ethical FAIR (findable, accessible, interoperable, reusable) principles," says Beech in the release. "The AIM-AHEAD project directly connects with the University of Houston's plan to train and diversify the future workforce in population health, increase the use of digital tools for chronic disease self-management, and to advance population health research."

Bettina Beech is the chief population health officer at the University of Houston and newly named AIM-AHEAD coordinating center team member. Photo via UH.edu

This Houston startup has a game-changing technology for deep learning. Photo via Getty Images

Houston artificial intelligence startup raises $6M in seed funding

money moves

A computer science professor at Rice University has raised seed funding last month in order to grow his company that's focused on democratizing artificial intelligence tools.

ThirdAI, founded by Anshumali Shrivastava in April, raised $6 million in a seed funding round from three California-based VCs — Neotribe Ventures and Cervin Ventures, which co-led the round with support from Firebolt Ventures.

Shrivastava, CEO, co-founded the company with Tharun Medini, a recent Ph.D. who graduated under Shrivastava from Rice's Department of Electrical and Computer Engineering. Medini serves as the CTO of ThirdAI — pronounced "third eye." The startup is building the next generation of scalable and sustainable AI tools and deep learning systems.

"We are democratizing artificial intelligence through software innovations," says Shrivastava in a news release from Rice. "Our innovation would not only benefit current AI training by shifting to lower-cost CPUs, but it should also allow the 'unlocking' of AI training workloads on GPUs that were not previously feasible."

The technology ThirdAI is working with comes from 10 years of deep learning research and innovation. The company's technology has the potential to make computing 15-times faster.

"ThirdAI has developed a breakthrough approach to train deep learning models with a large number of parameters that run efficiently on general purpose CPUs. This technology has the potential to result in a gigantic leap forward in the accuracy of deep learning models," per and announcement from Cervin Ventures. "Our investment in ThirdAI was a no-brainer and we are fortunate to have had the opportunity to invest."

Anshumali Shrivastava is an associate professor of computer science at Rice University. Photo via rice.edu

In a guest column, these lawyers explain the pros and cons of using AI for hiring. Photo via Getty Images

Here's what Houston employers need to know about using artificial intelligence in the hiring process

guest column

Workplace automation has entered the human resource department. Companies rely increasingly on artificial intelligence to source, interview, and hire job applicants. These AI tools are marketed to save time, improve the quality of a workforce, and eliminate unlawful hiring biases. But is AI incapable of hiring discrimination? Can a company escape liability for discriminatory hiring because, "the computer did it?"

Ultimately, whether AI is a solution or a landmine depends on how carefully companies implement the technology. AI is not immune from discrimination and federal law holds companies accountable for their hiring decisions, even if those decisions were made in a black server cabinet. The technology can mitigate bias, but only if used properly and monitored closely.

Available AI tools

The landscape of AI technology is continually growing and covers all portions of the hiring process — recruiting, interviewing, selection, and onboarding. Some companies use automated candidate sourcing technology to search social media profiles to determine which job postings should be advertised to particular candidates. Others use complex algorithms to determine which candidates' resumes best match the requirements of open positions. And some employers use video interview software to analyze facial expressions, body language, and tone to assess whether a candidate exhibits preferred traits.

Federal anti-discrimination law

Although AI tools likely have no intent to unlawfully discriminate, that does not absolve them from liability. This is because the law contemplates both intentional discrimination (disparate treatment) as well as unintentional discrimination (disparate impact). The larger risk for AI lies with disparate impact claims. In such lawsuits, intent is irrelevant. The question is whether a facially neutral policy or practice (e.g., use of an AI tool) has a disparate impact on a particular protected group, such as on one's race, color, national origin, gender, or religion.

The Equal Employment Opportunity Commission, the federal agency in charge of enforcing workplace anti-discrimination laws, has demonstrated an interest in AI and has indicated that such technology is not an excuse for discriminatory impacts.

Discrimination associated with AI tools

The diversity of AI tools means that each type of technology presents unique potential for discrimination. One common thread, however, is the potential for input data to create a discriminatory impact. Many algorithms rely on a set of inputs to understand search parameters. For example, a resume screening tool is often set up by uploading sample resumes of high-performing employees. If those resumes favor a particular race or gender, and the tool is instructed to find comparable resumes, then the technology will likely reinforce the existing homogeneity.

Some examples are less obvious. Sample resumes may include employees from certain zip codes that are home to predominately one race or color. An AI tool may favor those zip codes, disfavoring applicants from other zip codes of different racial composition. Older candidates may be disfavored by an algorithm's preference for ".edu" email addresses. In short, if a workforce is largely comprised of one race or one gender, having the tool rely on past hiring decisions could negatively impact applicants of another race or gender.

Steps to mitigate risk

There are a handful of steps that employers can take to use these technologies and remain compliant with anti-discrimination laws.

First, companies should demand that AI vendors disclose as much as possible about how their products work. Vendors may be reticent to disclose details about proprietary information, but employers will ultimately be responsible for discriminatory impacts. Thus, as part of contract negotiations, a company should consider seeking indemnification from the vendor for discrimination claims.

Second, companies should consider auditing the tool to ensure it does not yield a disparate impact on protected individuals. Along the same lines, companies should be careful in selecting input data. If the inputs reflect a diverse workforce, a properly functioning algorithm should, in theory, replicate that diversity.

Third, employers should stay abreast of developments in the law. This is an emerging field and state legislators have taken notice. Illinois recently passed regulation governing the use of AI in the workplace and other states, including New York, have introduced similar bills.

AI can solve many hiring challenges and help cultivate a more diverse and qualified workforce. But the tools are often only as unbiased as the creators and users of that technology. Careful implementation will ensure AI becomes a discrimination solution — not a landmine.

------

Kevin White is a partner and Dan Butler is an associate with Hunton Andrews Kurth LLP, which has an office in Houston.

Jim Havelka, founder and CEO of InformAI, joins the Houston Innovators Podcast to discuss the difference his technology can make on the health care industry. Photo courtesy of InformAI

Houston health tech founder shares the monumental impact data can have on health care

HOUSTON INNOVATORS PODCAST EPISODE 68

Hospitals are processing massive amounts of data on a daily basis — but few are optimizing this information in life-saving capacities. A Houston company is seeking to change that.

InformAI has created several tech products to allow hospitals to tap into their data for game-changing health care.

"The convergence of technology, data, and deep learning has really opened up an avenue to look at large volumes of information and look at patterns that can be helpful in patient diagnosis and treatment planning," says CEO Jim Havelka on this week's episode of the Houston Innovators Podcast.

The InformAI team has developed two platforms that each of the company's tech products works within. One focuses on medical images and looks for subtle patterns of a medical condition, while the other can datamine patient information to identify patient risk predictors.

Currently, InformAI's sinusitis-focused product is undergoing Food and Drug Administration approval. About a quarter of the population has sinus-related issues, and the technology can help treatment and diagnosis, Havelka says.

"The data that we train our algorithms on are equivalent of 30 careers of a typical ear, nose, and throat surgeon. We see 30 times more patients in our training set than an ENT physician would see in a lifetime," Havelka says. "Being able to bring into play the patterns and unique subtleties that this data can bring into the decision making only makes the ENT more productive and more efficient, as well as creates better outcomes for patients."

InformAI has received venture capital support as well as a National Science Foundation award to advance its work. The company hopes to introduce a new round of funding later this year.

Havelka doesn't mince words when it comes to the importance of InformAI being located in Houston. The company's team works out of JLABS @ TMC as well as TMC Innovation Institute.

"Those relationships have been very helpful in getting data to build these particular products," Havelka says. "Just the Texas Medical Center alone has roughly 10 million patient encounters every year. The ability to get access to data and, equally important, the medical experts has been a tremendous benefit to InformAI."

Havelka discusses more about the revolutionary technology InformAI is working on — as well as advice he has for other health tech founders — on the episode. Listen to the full interview below — or wherever you stream your podcasts — and subscribe for weekly episodes.


Ad Placement 300x100
Ad Placement 300x600

CultureMap Emails are Awesome

Major corporation opens hub for global decarbonization in Houston

seeing green

Management consulting giant McKinsey & Co. plans to spend $100 million over the next decade to pump up Houston’s decarbonization economy.

McKinsey says the initiative will, among other things, focus on:

  • Promoting innovations like carbon capture, utilization, and storage (CCUS) and green hydrogen
  • Revamping business models for carbon-heavy companies
  • Ramping up the community of local startups involved in energy transition
  • Developing talent to work on decarbonization

As part of this program, McKinsey has set up a decarbonization hub in its Houston office, at 609 Main St.

“Decarbonization will lead to a new chapter of economic development, while also addressing a critical problem of climate change,” McKinsey partner Nikhil Ati says.

Global decarbonization efforts over the next three decades will require a $100 trillion investment, according to Utility Dive. Houston, home to 40 percent of publicly traded oil and gas companies, stands to gain a substantial share of that opportunity.

McKinsey’s Houston office has worked for several years on Houston’s energy transition initiatives. For instance, the firm helped produce a study and a whitepaper on energy transition here. The whitepaper outlines Houston’s future as the “epicenter of a global clean hydrogen hub.”

“Texas is the nation’s largest renewable energy producer, home to half of the nation’s hydrogen pipelines, and its companies have unparalleled capabilities in building and operating complex projects,” McKinsey senior partner Filipe Barbosa says. “This is Houston’s moment in time on the global stage.”

McKinsey estimates a Houston-based global hub for clean hydrogen that’s in place by 2050 could generate 180,000 jobs and create an economic impact of $100 billion.

3 Houston innovators to know this week

who's who

Editor's note: In this week's roundup of Houston innovators to know, I'm introducing you to three local innovators across industries — from photonics to robotics — recently making headlines in Houston innovation.

Brad Burke, managing director of the Rice Alliance for Technology and Entrepreneurship

Brad Burke joins this week's Houston Innovators Podcast. Photo via alliance.rice.edu

Collaboration has made a world of a difference for growing Houston's innovation ecosystem, according to Brad Burke, managing director of the Rice Alliance for Technology and Entrepreneurship.

"I think Houston has this culture of collaboration that I suspect that some other major cities don't have in the same way," Burke says on the Houston Innovators Podcast. "And while we're a big city, the entrepreneurial ecosystem feels like a small network of a lot of people who work really well together."

Burke has played a major role in the collaboration of Houston for the past 20 years leading the Rice Alliance, which coordinates many event programs and accelerators — including the Rice Business Plan Competition, energy and life science forums, the Clean Energy Accelerator, Owl Spark, Blue Launch, and more. Click here to read more.

Trevor Best, CEO and co-founder of Syzygy Plasmonics

A new partnership for Houston-based Syzygy will generate 1.2 million tons of clean hydrogen each year in South Korea by 2030. Image via Syzygy

Houston-area energy tech startup Syzygy Plasmonics is part of a new partnership that will develop a fully electric chemical reactor for production of clean hydrogen in South Korea.

The reactor will be installed in the second half of 2023 at Lotte Fine Chemical’s facilities in Ulsan, South Korea. Lotte Fine Chemical, Lotte Chemical, and Sumitomo Corporation of Americas are Syzygy’s partners in this venture.

“Simply improving existing tech isn’t enough to reach the world’s decarbonization goals. Stopping climate change will require industries to reimagine what is possible,” Syzygy co-founder and CEO Trevor Best says in a news release. “Our technology expands the accepted paradigms of chemical engineering. We have demonstrated the ability to replace heat from combustion with renewable electricity in the manufacture of foundational chemicals like hydrogen.” Click here to read more.

Nicolaus Radford, CEO and founder of Nauticus Robotics

Houston-based Nauticus Robotics has hit the public market. Image via LinkedIn

Fresh off its September 13 debut as a publicly traded company, Webster-based Nauticus Robotics Inc. is aiming for $90 million in revenue next year as it dives deeper into the ocean economy.

The stock of Nauticus now trades on the NASDAQ market under the ticker symbol KITT. Nauticus went public following its SPAC (special purpose acquisition company) merger with New York City-based CleanTech Acquisition Corp., a “blank check” company that went public in July 2021 through a $150 million IPO. The SPAC deal was valued at $560 million when it was announced in December.

Nauticus continues to be led by CEO Nicolaus Radford and the current executive team.

“The closing of this business combination represents a pivotal milestone in our company’s history as we take public our pursuit of transforming the ocean robotics industry with autonomous systems,” says Radford, who founded what was known as Houston Mechatronics in 2014. “Not only is the ocean a tremendous economic engine, but it is also the epicenter for building a sustainable future.” Click here to read more.

Houston startup snags prestigious grant from global health leader

big win

A female-founded biotech startup has announced that it has received a grant from the Bill & Melinda Gates Foundation.

Steradian Technologies has developed a breath-based collection device that can be used with diagnostic testing systems. Called RUMI, the device is non-invasive and fully portable and, according to a news release, costs the price of a latte.

“We are extremely honored to receive this award and be recognized by the Bill & Melinda Gates Foundation, a leader in global health. This funding will propel our work in creating deep-tech diagnostics and products to close the equity gap in global public health," says Asma Mirza, CEO and co-founder of Steradian Technologies, in the release. “The RUMI will demonstrate that advanced technology can be delivered to all areas of the world, ensuring the Global South and economically exploited regions receive access to high-fidelity diagnostics instead of solutions that are ill-suited to the environment.”

RUMI uses novel photon-based detection to collect and diagnose infectious diseases in breath within 30-seconds, per the release, and will be the first human bio-aerosol specimen collector to convert breath into a fully sterile liquid sample and can be used for many applications in direct disease detection.

"As the healthcare industry continues to pursue less invasive diagnostics, we are very excited that the foundation has identified our approach to breath-based sample collection as a standout worthy of their support," says John Marino, chief of product development and co-founder. “We look forward to working with them to achieve our goals of better, faster, and safer diagnostics."

Founded in 2017, Steradian Technologies is funded and supported by XPRIZE, Johnson & Johnson’s Lung Cancer Initiative, JLABS TMCi, Capital Factory, Duke Institute of Global Health, and Johnson & Johnson’s Center for Device Innovation.

The amount granted by the Bill & Melinda Gates Foundation was not disclosed. The Seattle-based foundation is led by CEO Mark Suzman and co-chaired by Bill Gates and Melinda French Gatess.