Let's talk about dark data — what it means and how to navigate it. Graphic by Miguel Tovar/University of Houston

Is it necessary to share ALL your data? Is transparency a good thing or does it make researchers “vulnerable,” as author Nathan Schneider suggests in the Chronicle of Higher Education article, “Why Researchers Shouldn’t Share All Their Data.”

Dark Data Defined

Dark data is defined as the universe of information an organization collects, processes and stores – oftentimes for compliance reasons. Dark data never makes it to the official publication part of the project. According to the Gartner Glossary, “storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

This topic is reminiscent of the file drawer effect, a phenomenon which reflects the influence of the results of a study on whether or not the study is published. Negative results can be just as important as hypotheses that are proven.

Publication bias and the need to only publish positive research that supports the PI’s hypothesis, it can be argued, is not good science. According to an article in the Indian Journal of Anaesthesia, authors Priscilla Joys Nagarajan, et al., wrote: “It is speculated that every significant result in the published world has 19 non-significant counterparts in file drawers.” That’s one definition of dark data.

Total Transparency

But what to do with all your excess information that did not make it to publication, most likely because of various constraints? Should everything, meaning every little tidbit, be readily available to the research community?

Schneider doesn’t think it should be. In his article, he writes that he hides some findings in a paper notebook or behind a password, and he keeps interviews and transcripts offline altogether to protect his sources.

Open-source

Open-source software communities tend to regard total transparency as inherently good. What are the advantages of total transparency? You may make connections between projects that you wouldn’t have otherwise. You can easily reproduce a peer’s experiment. You can even become more meticulous in your note-taking and experimental methods since you know it’s not private information. Similarly, journalists will recognize this thought pattern as the recent, popular call to engage in “open journalism.” Essentially, an author’s entire writing and editing process can be recorded, step by step.

TMI

This trend has led researchers to open-source programs like Jupyter and GitHub. Open-source programs detail every change that occurs along a project’s timeline. Is unorganized, excessive amounts of unpublishable data really what transparency means? Or does it confuse those looking for meaningful research that is meticulously curated?

The Big Idea

And what about the “vulnerability” claim? Sharing every edit and every new direction taken opens a scientist up to scoffers and harassment, even. Dark data in industry even involves publishing salaries, which can feel unfair to underrepresented, marginalized populations.

In Model View Culture, Ellen Marie Dash wrote: “Let’s give safety and consent the absolute highest priority, with openness and transparency prioritized explicitly below those. This means digging deep, properly articulating in detail what problems you are trying to solve with openness and transparency, and handling them individually or in smaller groups.”

------

This article originally appeared on the University of Houston's The Big Idea. Sarah Hill, the author of this piece, is the communications manager for the UH Division of Research.

A new UH-led program will work with energy corporations to prepare the sector's future workforce. Photo via Getty Images

University of Houston leads data science collaboration to propel energy transition

seeing green

Five Texas schools have teamed up with energy industry partners to create a program to train the sectors future workforce. At the helm of the initiative is the University of Houston.

The Data Science for Energy Transition project, which is funded through 2024 by a $1.49 million grant from the National Science Foundation, includes participation from UH, the University of Houston-Downtown, the University of Houston-Victoria, the University of Houston-Clear Lake, and Sam Houston State University.

The project will begin but introducing a five-week data science camp next summer where undergraduate and master’s level students will examine data science skills already in demand — as well as the skills that will be needed in the future as the sector navigates a shift to new technologies.

The camp will encompass computer science and programming, statistics, machine learning, geophysics and earth science, public policy, and engineering, according to a news release from UH. The project’s principal investigator is Mikyoung Jun, ConocoPhillips professor of data science at the UH College of Natural Science and Mathematics.

The new program's principal investigator is Mikyoung Jun. Photo via UH.edu

“It’s obvious that the Houston area is the capital for the energy field. We are supporting our local industries by presenting talented students from the five sponsoring universities and other Texas state universities with the essential skills to match the growing needs within those data science workforces,” Jun says in the release. “We’re planning all functions in a hybrid format so students located outside of Houston, too, can join in.”

Jun describes the camp as having a dual focus — both on the issue of energy transition to renewable sources as well as the traditional energy, because that's not being eradicated any time soon, she explains.

Also setting the program apart is the camp's prerequisites — or lack thereof. The program is open to majors in energy-related fields, such as data science or petroleum engineering, as well as wide-ranging fields of study, such as business, art, history, law, and more.

“The camp is not part of a degree program and its classes do not offer credits toward graduation, so students will continue to follow their own degree plan,” Jun says in the release. “Our goal with the summer camp is to give students a solid footing in data science and energy-related fields to help them focus on skills needed in data science workforces in energy-related companies in Houston and elsewhere. Although that may be their first career move, they may settle in other industries later. Good skills in data processing can make them wise hires for many technology-oriented organizations.”

Jun's four co-principal investigators include Pablo Pinto, professor at UH’s Hobby School of Public Affairs and director of the Center for Public Policy; Jiajia Sun, UH assistant professor of geophysics; Dvijesh Shastri, associate professor of computer science, UH-Downtown; and Yun Wan, professor of computer information systems and chair of the Computer Science Division, UH-Victoria. Eleven other faculty members from five schools will serve as senior personnel. The initiative's energy industry partners include Conoco Phillips, Schlumberger, Fugro, Quantico Energy Solutions, Shell, and Xecta Web Technologies.

The program's first iteration will select 40 students to participate in the camp this summer. Applications, which have not opened yet, will be made available online.

The Data Science for Energy Transition project is a collaboration between five schools. Image via UH.edu

Houston companies need cybersecurity professionals — and universities can help. Photo via Getty Images

How universities can help equip Houston with a skilled cybersecurity workforce

guest column

With an increasing number of data breaches, a high job growth rate, and a persistent skills gap, cybersecurity professionals will be some of the most in-demand workers in 2022. It’s more important than ever to have people that are properly trained to protect individuals, corporations, and communities.

Demand for cybersecurity talent in Texas is high. According to Burning Glass Labor Insights, employers in the Houston metro area have posted over 24,000 cybersecurity jobs since the beginning of 2021. But the pipeline of cybersecurity workers is very low, which means many local and national companies don’t have enough people on the front lines defending against these attacks.

Unfortunately, it looks like the cybersecurity skills gap is far from over. An annual industry report from the Information Systems Security Association shows that the global demand for cybersecurity skills still far exceeds the current supply of traditionally qualified individuals, with 38 percent of cybersecurity roles currently unfilled. This shortage has real-life, real-world consequences that can result in misconfigured systems and improper risk assessment and management.

How can companies help close the cybersecurity skills gap within their own organizations? We believe it will become increasingly important to look beyond “traditionally qualified” candidates and view hands-on experience as the same, or even more important than, the certifications or bachelor degree requirements often found in cybersecurity job descriptions.

The top open cybersecurity roles in the Houston area include analysts, managers, engineers, and developers. Employees in these positions are essential to the everyday monitoring, troubleshooting, testing and analyzing that helps companies protect data and stay one step ahead of hackers. When looking to fill these roles, hiring managers should be looking for candidates with both the knowledge and experience to take on these critical positions.

Fortunately, Houston-based companies looking to establish, grow, or upskill their cybersecurity teams don’t have to go far to find top-tier talent and training programs. More local colleges and universities are offering alternative credential programs, like boot camps, that provide students with the deep understanding and hands-on learning they need to excel in the roles that companies need to fill.

2U, Inc. and Rice University have partnered to power a data-driven, market-responsive cybersecurity boot camp that provides students with hands-on training in networking, systems, web technologies, databases, and defensive and offensive cybersecurity. Over 40 percent of the students didn’t have bachelor degrees prior to enrolling in the program. Since launching in 2019, the program has produced more than 140 graduates, some of whom have gone on to work in cybersecurity roles at local companies such as CenterPoint Energy, Fulcrum Technology Solutions, and Hewlett Packard.

Recognizing programs like university boot camps as local workforce generators not only gives companies a larger talent pool to recruit from, but also increases the opportunity for cybersecurity teams to diversify and include professionals with different experiences and backgrounds. We’re living in a security-first world, and the right mix of cybersecurity talent is essential to keeping us protected wherever we are.

------

David Vassar is the assistant dean of Susanne M. Glasscock School of Continuing Studies at Rice University. Bret Fund is vice president overseeing cybersecurity programs at 2U.

"Better and personalized healthcare through AI is still a hugely challenging problem that will take an army of scientists and engineers." Photo via UH.edu

Houston expert explains health care's inequity problem

guest column

We are currently in the midst of what some have called the "wild west" of AI. Though healthcare is one of the most heavily regulated sectors, the regulation of AI in this space is still in its infancy. The rules are being written as we speak. We are playing catch-up by learning how to reap the benefits these technologies offer while minimizing any potential harms once they've already been deployed.

AI systems in healthcare exacerbate existing inequities. We've seen this play out into real-world consequences from racial bias in the American justice system and credit scoring, to gender bias in resume screening applications. Programs that are designed to bring machine "objectivity" and ease to our systems end up reproducing and upholding biases with no means of accountability.

The algorithm itself is seldom the problem. It is often the data used to program the technology that merits concern. But this is about far more than ethics and fairness. Building AI tools that take account of the whole picture of healthcare is fundamental to creating solutions that work.

The Algorithm is Only as Good as the Data

By nature of our own human systems, datasets are almost always partial and rarely ever fair. As Linda Nordling comments in a Nature article, A fairer way forward for AI in healthcare, "this revolution hinges on the data that are available for these tools to learn from, and those data mirror the unequal health system we see today."

Take, for example, the finding that Black people in US emergency rooms are 40 percent less likely to receive pain medication than are white people, and Hispanic patients are 25 percent less likely. Now, imagine the dataset these findings are based on is used to train an algorithm for an AI tool that would be used to help nurses determine if they should administer pain relief medication. These racial disparities would be reproduced and the implicit biases that uphold them would remain unquestioned, and worse, become automated.

We can attempt to improve these biases by removing the data we believe causes the bias in training, but there will still be hidden patterns that correlate with demographic data. An algorithm cannot take in the nuances of the full picture, it can only learn from patterns in the data it is presented with.

Bias Creep

Data bias creeps into healthcare in unexpected ways. Consider the fact that animal models used in laboratories across the world to discover and test new pain medications are almost entirely male. As a result, many medications, including pain medication, are not optimized for females. So, it makes sense that even common pain medications like ibuprofen and naproxen have been proven to be more effective in men than women and that women tend to experience worse side effects from pain medication than men do.

In reality, male rodents aren't perfect test subjects either. Studies have also shown that both female and male rodents' responses to pain levels differ depending on the sex of the human researcher present. The stress response elicited in rodents to the olfactory presence of a sole male researcher is enough to alter their responses to pain.

While this example may seem to be a departure from AI, it is in fact deeply connected — the current treatment choices we have access to were implicitly biased before the treatments ever made it to clinical trials. The challenge of AI equity is not a purely technical problem, but a very human one that begins with the choices that we make as scientists.

Unequal Data Leads to Unequal Benefits

In order for all of society to enjoy the many benefits that AI systems can bring to healthcare, all of society must be equally represented in the data used to train these systems. While this may sound straightforward, it's a tall order to fill.

Data from some populations don't always make it into training datasets. This can happen for a number of reasons. Some data may not be as accessible or it may not even be collected at all due to existing systemic challenges, such as a lack of access to digital technology or simply being deemed unimportant. Predictive models are created by categorizing data in a meaningful way. But because there's generally less of it, "minority" data tends to be an outlier in datasets and is often wiped out as spurious in order to create a cleaner model.

Data source matters because this detail unquestionably affects the outcome and interpretation of healthcare models. In sub-Saharan Africa, young women are diagnosed with breast cancer at a significantly higher rate. This reveals the need for AI tools and healthcare models tailored to this demographic group, as opposed to AI tools used to detect breast cancer that are only trained on mammograms from the Global North. Likewise, a growing body of work suggests that algorithms used to detect skin cancer tend to be less accurate for Black patients because they are trained mostly on images of light-skinned patients. The list goes on.

We are creating tools and systems that have the potential to revolutionize the healthcare sector, but the benefits of these developments will only reach those represented in the data.

So, what can be done?

Part of the challenge in getting bias out of data is that high volume, diverse and representative datasets are not easy to access. Training datasets that are publicly available tend to be extremely narrow, low-volume, and homogenous—they only capture a partial picture of society. At the same time, a wealth of diverse health data is captured every day in many healthcare settings, but data privacy laws make accessing these more voluminous and diverse datasets difficult.

Data protection is of course vital. Big Tech and governments do not have the best track record when it comes to the responsible use of data. However, if transparency, education, and consent for the sharing of medical data was more purposefully regulated, far more diverse and high-volume data sets could contribute to fairer representation across AI systems and result in better, more accurate results for AI-driven healthcare tools.

But data sharing and access is not a complete fix to healthcare's AI problem. Better and personalized healthcare through AI is still a hugely challenging problem that will take an army of scientists and engineers. At the end of the day, we want to teach our algorithms to make good choices but we are still figuring out what good choices should look like for ourselves.

AI presents the opportunity to bring greater personalization to healthcare, but it equally presents the risk of entrenching existing inequalities. We have the opportunity in front of us to take a considered approach to data collection, regulation, and use that will provide a fuller and fairer picture and enable the next steps for AI in healthcare.

------

Angela Wilkins is the executive director of the Ken Kennedy Institute at Rice University.

This health tech company has made some significant changes in order to keep up with its growth. Photo via Getty Images

Houston data solutions startup rebrands, expands to support neuroscience research

startup soars

With a new CEO and chief operating officer aboard, Houston-based DataJoint is thinking small in order to go big.

Looking ahead to 2022, DataJoint aims to enable hundreds of smaller projects rather than a handful of mega-projects, CEO Dimitri Yatsenko says. DataJoint develops data management software that empowers collaboration in the neuroscience and artificial intelligence sectors.

"Our strategy is to take the lessons that we have learned over the past four years working with major projects with multi-institutional consortia," Yatsenko says, "and translate them into a platform that thousands of labs can use efficiently to accelerate their research and make it more open and rigorous."

Ahead of that shift, the startup has undergone some significant changes, including two moves in the C-suite.

Yatsenko became CEO in February after stints as vice president of R&D and as president. He co-founded the company as Vathes LLC in 2016. Yatsenko succeeded co-founder Edgar Walker, who had been CEO since May 2020 and was vice president of engineering before that.

In tandem with Yatsenko's ascent to CEO, the company brought aboard Jason Kirkpatrick as COO. Kirkpatrick previously was chief financial officer of Houston-based Darcy Partners, an energy industry advisory firm; chief operating officer and chief financial officer of Houston-based Solid Systems CAD Services (SSCS), an IT services company; and senior vice president of finance and general manager of operations at Houston-based SmartVault Corp., a cloud-based document management company.

"Most of our team are scientists and engineers. Recruiting an experienced business leader was a timely step for us, and Jason's vast leadership experience in the software industry and recurring revenue models added a new dimension to our team," Yatsenko says.

Other recent changes include:

  • Converting from an LLC structure to a C corporation structure to enable founders, employees, and future investors to be granted shares of the company's stock.
  • Shortening the business' name to DataJoint from DataJoint Neuro and recently launching its rebranded website.
  • Moving the company's office from the Texas Medical Center Innovation Institute (TMCx) to the Galleria area. The new space will make room for more employees. Yatsenko says the 12-employee startup plans to increase its headcount to 15 to 20 by the end of this year.

Over the past five years, the company's customer base has expanded to include neuroscience institutions such as Princeton University's Princeton Neuroscience Institute and Columbia University's Zuckerman Institute for Brain Science, as well as University College London and the Norwegian University of Science and Technology. DataJoint's growth has been fueled in large part by grants from the U.S. Defense Advanced Research Projects Agency (DARPA) and the Brain Research Through Advancing Innovative Neurotechnologies (BRAIN) Initiative at the National Institutes of Health (NIH).

"The work we are tackling has our team truly excited about the future, particularly the capabilities being offered to the neuroscience community to understand how the brain forms perceptions and generates behavior," Yatsenko says.

Ryan Sitton joins the Houston Innovators Podcast to discuss his career in data and reliability. Photo courtesy of Ryan Sitton

This Houston innovator is using data to power industrial reliability and sustainability

HOUSTON INNOVATORS PODCAST EPISODE 92

Ryan Sitton has had a varied career so far. Formerly working in oil and gas, he started his own company in 2006 to help companies to better utilize their data. Now, still leading Houston-based Pinnacle as CEO, Sitton works with the world's largest companies to solve their problems with data. He also served as Texas Railroad Commissioner and has written two books about decision-making and leadership.

Sitton joined the Houston Innovators Podcast to discuss how, despite the multiple hats he wears, at the core of his passion is using data to drive better decision making to drive more sustainable and reliable operations.

"I was basically doing data analytics in the mid 2000s before it was sexy. I was pulling together data in chemical plants and refineries and trying to predict how these plants would behave with the data I had," Sitton says. "I realized early on how there was so much opportunity here — but we don't have the technologies or the methodology to do it."

But over the years, the technology has caught up and now Sitton is able to provide clients with even more data-driven solutions.

"We go into the biggest companies of the world that are trying to have more reliability at the same time as lowering cost," Sitton explains. "We build models that pull together literally millions of pieces of data and we do a combination of engineering analysis and data analysis and data science to give them better predictions — better ways to run their facilities so that they are more sustainable and more profitable."

Sitton's two books — Crucial Decisions, which is out now, and the Myth of Status, which is coming later this year — also center around decision making and leadership.

He shares more about his time at the Texas Railroad Commision and shares how COVID-19 affected business — as well as shares his advice for startups tapping into data-driven solutions — on the episode. Listen to the full interview below — or wherever you stream your podcasts — and subscribe for weekly episodes.


Ad Placement 300x100
Ad Placement 300x600

CultureMap Emails are Awesome

Deadline extended: InnovationMap, HX open nominations for new combined awards gala

calling all innovators

Update: The deadline for nominations have been extended to midnight on Sunday, October 2.

InnovationMap is back to honor local startups and innovators — and this time, we've upped the ante.

Houston Exponential and InnovationMap have teamed up to combine their annual awards and event efforts to premiere a brand new program. The Houston Innovation Awards Gala on Wednesday, November 9, at The Ion will be a comprehensive event honoring Houston founders, innovators, investors, and more. InnovationMap and HX, which was acquired earlier this year, are in the same network of ownership.

Nominations are open online until midnight October 2, and nominees will have until October 11 to complete an additional application that will be emailed to nominees directly. A group of industry experts and Houston innovation leaders will review those submissions and determine finalists and winners across 11 categories. The categories for this year's awards are:

  • BIPOC-Owned Business honoring an innovative company founded or co-founded by BIPOC representation
  • Female-Owned Business honoring an innovative company founded or co-founded by a woman
  • Hardtech Business honoring an innovative company developing and commercializing a physical technology across life science, energy, space, and beyond
  • B2B Software Business honoring an innovative company developing and programming a digital solution to impact the business sector
  • Green Impact Business honoring an innovative company providing a solution within renewables, climatetech, clean energy, alternative materials, and beyond
  • Smart City Business honoring an innovative company providing a tech solution within transportation, infrastructure, data, and beyond
  • New to Hou honoring an innovative company, accelerator, or investor that has relocated its primary operations to Houston within the past three years
  • DEI Champion honoring an individual who is leading impactful diversity, equity, and inclusion initiatives and progress within Houston and their organization
  • Investor of the Year honoring an individual who is leading venture capital or angel investing
  • Mentor of the Year honoring an individual who dedicates their time and expertise to guide and support to budding entrepreneurs
  • People's Choice: Startup of the Year selected via an interactive voting portal during of the event
Nominees can be submitted to multiple categories.

Additionally, the awards gala will honor an innovator who's made a lasting impact on the Houston innovation community. While you may nominate an individual for the Trailblazer Award via the online form, the judging committee will not require applications or nominations for this category and will be considering potential honorees from the ecosystem at large. If you are interested in sponsorship opportunities, please reach out to cbuckner@houstonexponential.org.

Last year, InnovationMap introduced its awards program and named 28 finalists and honored the nine winners on September 8. Click here to see more from last year's event.

Tickets for the November 9 event are available online. Early bird tickets will be $60 per person and startup founders will be able to attend for $25.

Click here to submit a nomination or see form below.


Major corporation opens hub for global decarbonization in Houston

seeing green

Management consulting giant McKinsey & Co. plans to spend $100 million over the next decade to pump up Houston’s decarbonization economy.

McKinsey says the initiative will, among other things, focus on:

  • Promoting innovations like carbon capture, utilization, and storage (CCUS) and green hydrogen
  • Revamping business models for carbon-heavy companies
  • Ramping up the community of local startups involved in energy transition
  • Developing talent to work on decarbonization

As part of this program, McKinsey has set up a decarbonization hub in its Houston office, at 609 Main St.

“Decarbonization will lead to a new chapter of economic development, while also addressing a critical problem of climate change,” McKinsey partner Nikhil Ati says.

Global decarbonization efforts over the next three decades will require a $100 trillion investment, according to Utility Dive. Houston, home to 40 percent of publicly traded oil and gas companies, stands to gain a substantial share of that opportunity.

McKinsey’s Houston office has worked for several years on Houston’s energy transition initiatives. For instance, the firm helped produce a study and a whitepaper on energy transition here. The whitepaper outlines Houston’s future as the “epicenter of a global clean hydrogen hub.”

“Texas is the nation’s largest renewable energy producer, home to half of the nation’s hydrogen pipelines, and its companies have unparalleled capabilities in building and operating complex projects,” McKinsey senior partner Filipe Barbosa says. “This is Houston’s moment in time on the global stage.”

McKinsey estimates a Houston-based global hub for clean hydrogen that’s in place by 2050 could generate 180,000 jobs and create an economic impact of $100 billion.

3 Houston innovators to know this week

who's who

Editor's note: In this week's roundup of Houston innovators to know, I'm introducing you to three local innovators across industries — from photonics to robotics — recently making headlines in Houston innovation.

Brad Burke, managing director of the Rice Alliance for Technology and Entrepreneurship

Brad Burke joins this week's Houston Innovators Podcast. Photo via alliance.rice.edu

Collaboration has made a world of a difference for growing Houston's innovation ecosystem, according to Brad Burke, managing director of the Rice Alliance for Technology and Entrepreneurship.

"I think Houston has this culture of collaboration that I suspect that some other major cities don't have in the same way," Burke says on the Houston Innovators Podcast. "And while we're a big city, the entrepreneurial ecosystem feels like a small network of a lot of people who work really well together."

Burke has played a major role in the collaboration of Houston for the past 20 years leading the Rice Alliance, which coordinates many event programs and accelerators — including the Rice Business Plan Competition, energy and life science forums, the Clean Energy Accelerator, Owl Spark, Blue Launch, and more. Click here to read more.

Trevor Best, CEO and co-founder of Syzygy Plasmonics

A new partnership for Houston-based Syzygy will generate 1.2 million tons of clean hydrogen each year in South Korea by 2030. Image via Syzygy

Houston-area energy tech startup Syzygy Plasmonics is part of a new partnership that will develop a fully electric chemical reactor for production of clean hydrogen in South Korea.

The reactor will be installed in the second half of 2023 at Lotte Fine Chemical’s facilities in Ulsan, South Korea. Lotte Fine Chemical, Lotte Chemical, and Sumitomo Corporation of Americas are Syzygy’s partners in this venture.

“Simply improving existing tech isn’t enough to reach the world’s decarbonization goals. Stopping climate change will require industries to reimagine what is possible,” Syzygy co-founder and CEO Trevor Best says in a news release. “Our technology expands the accepted paradigms of chemical engineering. We have demonstrated the ability to replace heat from combustion with renewable electricity in the manufacture of foundational chemicals like hydrogen.” Click here to read more.

Nicolaus Radford, CEO and founder of Nauticus Robotics

Houston-based Nauticus Robotics has hit the public market. Image via LinkedIn

Fresh off its September 13 debut as a publicly traded company, Webster-based Nauticus Robotics Inc. is aiming for $90 million in revenue next year as it dives deeper into the ocean economy.

The stock of Nauticus now trades on the NASDAQ market under the ticker symbol KITT. Nauticus went public following its SPAC (special purpose acquisition company) merger with New York City-based CleanTech Acquisition Corp., a “blank check” company that went public in July 2021 through a $150 million IPO. The SPAC deal was valued at $560 million when it was announced in December.

Nauticus continues to be led by CEO Nicolaus Radford and the current executive team.

“The closing of this business combination represents a pivotal milestone in our company’s history as we take public our pursuit of transforming the ocean robotics industry with autonomous systems,” says Radford, who founded what was known as Houston Mechatronics in 2014. “Not only is the ocean a tremendous economic engine, but it is also the epicenter for building a sustainable future.” Click here to read more.