Let's talk about dark data — what it means and how to navigate it. Graphic by Miguel Tovar/University of Houston

Is it necessary to share ALL your data? Is transparency a good thing or does it make researchers “vulnerable,” as author Nathan Schneider suggests in the Chronicle of Higher Education article, “Why Researchers Shouldn’t Share All Their Data.”

Dark Data Defined

Dark data is defined as the universe of information an organization collects, processes and stores – oftentimes for compliance reasons. Dark data never makes it to the official publication part of the project. According to the Gartner Glossary, “storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

This topic is reminiscent of the file drawer effect, a phenomenon which reflects the influence of the results of a study on whether or not the study is published. Negative results can be just as important as hypotheses that are proven.

Publication bias and the need to only publish positive research that supports the PI’s hypothesis, it can be argued, is not good science. According to an article in the Indian Journal of Anaesthesia, authors Priscilla Joys Nagarajan, et al., wrote: “It is speculated that every significant result in the published world has 19 non-significant counterparts in file drawers.” That’s one definition of dark data.

Total Transparency

But what to do with all your excess information that did not make it to publication, most likely because of various constraints? Should everything, meaning every little tidbit, be readily available to the research community?

Schneider doesn’t think it should be. In his article, he writes that he hides some findings in a paper notebook or behind a password, and he keeps interviews and transcripts offline altogether to protect his sources.

Open-source

Open-source software communities tend to regard total transparency as inherently good. What are the advantages of total transparency? You may make connections between projects that you wouldn’t have otherwise. You can easily reproduce a peer’s experiment. You can even become more meticulous in your note-taking and experimental methods since you know it’s not private information. Similarly, journalists will recognize this thought pattern as the recent, popular call to engage in “open journalism.” Essentially, an author’s entire writing and editing process can be recorded, step by step.

TMI

This trend has led researchers to open-source programs like Jupyter and GitHub. Open-source programs detail every change that occurs along a project’s timeline. Is unorganized, excessive amounts of unpublishable data really what transparency means? Or does it confuse those looking for meaningful research that is meticulously curated?

The Big Idea

And what about the “vulnerability” claim? Sharing every edit and every new direction taken opens a scientist up to scoffers and harassment, even. Dark data in industry even involves publishing salaries, which can feel unfair to underrepresented, marginalized populations.

In Model View Culture, Ellen Marie Dash wrote: “Let’s give safety and consent the absolute highest priority, with openness and transparency prioritized explicitly below those. This means digging deep, properly articulating in detail what problems you are trying to solve with openness and transparency, and handling them individually or in smaller groups.”

------

This article originally appeared on the University of Houston's The Big Idea. Sarah Hill, the author of this piece, is the communications manager for the UH Division of Research.

"Better and personalized healthcare through AI is still a hugely challenging problem that will take an army of scientists and engineers." Photo via UH.edu

Houston expert explains health care's inequity problem

guest column

We are currently in the midst of what some have called the "wild west" of AI. Though healthcare is one of the most heavily regulated sectors, the regulation of AI in this space is still in its infancy. The rules are being written as we speak. We are playing catch-up by learning how to reap the benefits these technologies offer while minimizing any potential harms once they've already been deployed.

AI systems in healthcare exacerbate existing inequities. We've seen this play out into real-world consequences from racial bias in the American justice system and credit scoring, to gender bias in resume screening applications. Programs that are designed to bring machine "objectivity" and ease to our systems end up reproducing and upholding biases with no means of accountability.

The algorithm itself is seldom the problem. It is often the data used to program the technology that merits concern. But this is about far more than ethics and fairness. Building AI tools that take account of the whole picture of healthcare is fundamental to creating solutions that work.

The Algorithm is Only as Good as the Data

By nature of our own human systems, datasets are almost always partial and rarely ever fair. As Linda Nordling comments in a Nature article, A fairer way forward for AI in healthcare, "this revolution hinges on the data that are available for these tools to learn from, and those data mirror the unequal health system we see today."

Take, for example, the finding that Black people in US emergency rooms are 40 percent less likely to receive pain medication than are white people, and Hispanic patients are 25 percent less likely. Now, imagine the dataset these findings are based on is used to train an algorithm for an AI tool that would be used to help nurses determine if they should administer pain relief medication. These racial disparities would be reproduced and the implicit biases that uphold them would remain unquestioned, and worse, become automated.

We can attempt to improve these biases by removing the data we believe causes the bias in training, but there will still be hidden patterns that correlate with demographic data. An algorithm cannot take in the nuances of the full picture, it can only learn from patterns in the data it is presented with.

Bias Creep

Data bias creeps into healthcare in unexpected ways. Consider the fact that animal models used in laboratories across the world to discover and test new pain medications are almost entirely male. As a result, many medications, including pain medication, are not optimized for females. So, it makes sense that even common pain medications like ibuprofen and naproxen have been proven to be more effective in men than women and that women tend to experience worse side effects from pain medication than men do.

In reality, male rodents aren't perfect test subjects either. Studies have also shown that both female and male rodents' responses to pain levels differ depending on the sex of the human researcher present. The stress response elicited in rodents to the olfactory presence of a sole male researcher is enough to alter their responses to pain.

While this example may seem to be a departure from AI, it is in fact deeply connected — the current treatment choices we have access to were implicitly biased before the treatments ever made it to clinical trials. The challenge of AI equity is not a purely technical problem, but a very human one that begins with the choices that we make as scientists.

Unequal Data Leads to Unequal Benefits

In order for all of society to enjoy the many benefits that AI systems can bring to healthcare, all of society must be equally represented in the data used to train these systems. While this may sound straightforward, it's a tall order to fill.

Data from some populations don't always make it into training datasets. This can happen for a number of reasons. Some data may not be as accessible or it may not even be collected at all due to existing systemic challenges, such as a lack of access to digital technology or simply being deemed unimportant. Predictive models are created by categorizing data in a meaningful way. But because there's generally less of it, "minority" data tends to be an outlier in datasets and is often wiped out as spurious in order to create a cleaner model.

Data source matters because this detail unquestionably affects the outcome and interpretation of healthcare models. In sub-Saharan Africa, young women are diagnosed with breast cancer at a significantly higher rate. This reveals the need for AI tools and healthcare models tailored to this demographic group, as opposed to AI tools used to detect breast cancer that are only trained on mammograms from the Global North. Likewise, a growing body of work suggests that algorithms used to detect skin cancer tend to be less accurate for Black patients because they are trained mostly on images of light-skinned patients. The list goes on.

We are creating tools and systems that have the potential to revolutionize the healthcare sector, but the benefits of these developments will only reach those represented in the data.

So, what can be done?

Part of the challenge in getting bias out of data is that high volume, diverse and representative datasets are not easy to access. Training datasets that are publicly available tend to be extremely narrow, low-volume, and homogenous—they only capture a partial picture of society. At the same time, a wealth of diverse health data is captured every day in many healthcare settings, but data privacy laws make accessing these more voluminous and diverse datasets difficult.

Data protection is of course vital. Big Tech and governments do not have the best track record when it comes to the responsible use of data. However, if transparency, education, and consent for the sharing of medical data was more purposefully regulated, far more diverse and high-volume data sets could contribute to fairer representation across AI systems and result in better, more accurate results for AI-driven healthcare tools.

But data sharing and access is not a complete fix to healthcare's AI problem. Better and personalized healthcare through AI is still a hugely challenging problem that will take an army of scientists and engineers. At the end of the day, we want to teach our algorithms to make good choices but we are still figuring out what good choices should look like for ourselves.

AI presents the opportunity to bring greater personalization to healthcare, but it equally presents the risk of entrenching existing inequalities. We have the opportunity in front of us to take a considered approach to data collection, regulation, and use that will provide a fuller and fairer picture and enable the next steps for AI in healthcare.

------

Angela Wilkins is the executive director of the Ken Kennedy Institute at Rice University.

In a guest column, these lawyers explain the pros and cons of using AI for hiring. Photo via Getty Images

Here's what Houston employers need to know about using artificial intelligence in the hiring process

guest column

Workplace automation has entered the human resource department. Companies rely increasingly on artificial intelligence to source, interview, and hire job applicants. These AI tools are marketed to save time, improve the quality of a workforce, and eliminate unlawful hiring biases. But is AI incapable of hiring discrimination? Can a company escape liability for discriminatory hiring because, "the computer did it?"

Ultimately, whether AI is a solution or a landmine depends on how carefully companies implement the technology. AI is not immune from discrimination and federal law holds companies accountable for their hiring decisions, even if those decisions were made in a black server cabinet. The technology can mitigate bias, but only if used properly and monitored closely.

Available AI tools

The landscape of AI technology is continually growing and covers all portions of the hiring process — recruiting, interviewing, selection, and onboarding. Some companies use automated candidate sourcing technology to search social media profiles to determine which job postings should be advertised to particular candidates. Others use complex algorithms to determine which candidates' resumes best match the requirements of open positions. And some employers use video interview software to analyze facial expressions, body language, and tone to assess whether a candidate exhibits preferred traits.

Federal anti-discrimination law

Although AI tools likely have no intent to unlawfully discriminate, that does not absolve them from liability. This is because the law contemplates both intentional discrimination (disparate treatment) as well as unintentional discrimination (disparate impact). The larger risk for AI lies with disparate impact claims. In such lawsuits, intent is irrelevant. The question is whether a facially neutral policy or practice (e.g., use of an AI tool) has a disparate impact on a particular protected group, such as on one's race, color, national origin, gender, or religion.

The Equal Employment Opportunity Commission, the federal agency in charge of enforcing workplace anti-discrimination laws, has demonstrated an interest in AI and has indicated that such technology is not an excuse for discriminatory impacts.

Discrimination associated with AI tools

The diversity of AI tools means that each type of technology presents unique potential for discrimination. One common thread, however, is the potential for input data to create a discriminatory impact. Many algorithms rely on a set of inputs to understand search parameters. For example, a resume screening tool is often set up by uploading sample resumes of high-performing employees. If those resumes favor a particular race or gender, and the tool is instructed to find comparable resumes, then the technology will likely reinforce the existing homogeneity.

Some examples are less obvious. Sample resumes may include employees from certain zip codes that are home to predominately one race or color. An AI tool may favor those zip codes, disfavoring applicants from other zip codes of different racial composition. Older candidates may be disfavored by an algorithm's preference for ".edu" email addresses. In short, if a workforce is largely comprised of one race or one gender, having the tool rely on past hiring decisions could negatively impact applicants of another race or gender.

Steps to mitigate risk

There are a handful of steps that employers can take to use these technologies and remain compliant with anti-discrimination laws.

First, companies should demand that AI vendors disclose as much as possible about how their products work. Vendors may be reticent to disclose details about proprietary information, but employers will ultimately be responsible for discriminatory impacts. Thus, as part of contract negotiations, a company should consider seeking indemnification from the vendor for discrimination claims.

Second, companies should consider auditing the tool to ensure it does not yield a disparate impact on protected individuals. Along the same lines, companies should be careful in selecting input data. If the inputs reflect a diverse workforce, a properly functioning algorithm should, in theory, replicate that diversity.

Third, employers should stay abreast of developments in the law. This is an emerging field and state legislators have taken notice. Illinois recently passed regulation governing the use of AI in the workplace and other states, including New York, have introduced similar bills.

AI can solve many hiring challenges and help cultivate a more diverse and qualified workforce. But the tools are often only as unbiased as the creators and users of that technology. Careful implementation will ensure AI becomes a discrimination solution — not a landmine.

------

Kevin White is a partner and Dan Butler is an associate with Hunton Andrews Kurth LLP, which has an office in Houston.

Artificial intelligence is changing Houston — one industry at a time. Photo via Getty Images

3 ways artificial intelligence is changing Houston's future

Guest column

Artificial intelligence is the buzzword of the decade. From grocery shopping assistance to personal therapy apps, AI has sunk its teeth into every single industry. Houston is no exception to the AI boom. Enterprise-level companies and startups are already flocking to H-town to make their mark in AI and machine learning.

Since the world is generating more data every minute — 1,736 terabytes to be exact — Houston-based companies are already thinking ahead about how to make sense of all of that information in real-time. That's where AI comes in. By 2021, 80 percent of emerging technologies will have AI foundations — Houston is already ninth on the list of AI-ready cities in the world.

AI and machine learning can process large amounts of data quickly and use that data to inform decisions much like a human would. Here are three ways Houston-based companies are using these emerging technologies to revolutionize the city's future.

Health care

The health care industry is primed for AI's personalization capabilities. Each patient that doctors and nurses encounter has different symptoms, health backgrounds, and prescriptions they have to remember. Managing that amount of information can be dangerous if done incorrectly. With AI, diseases are diagnosed quicker, medications are administered more accurately, and nurses have help monitoring patients.

Decisio Health Inc., a Houston-based health tech startup has already made its mark in the healthcare industry with its AI software helping to tackle the COVID-19 pandemic. Their software, in collaboration with GE Healthcare Inc, allows health care providers to remotely monitor patients. By looking at data from ventilators, patient monitoring systems, health records, and other data sources, doctors can make better decisions about patients from a safe distance.

Climate change

Climate change isn't solved overnight. It's an issue that covers water salinity, deforestation, and even declining bee populations. With a problem as large as climate change, huge amounts of data are collected and need to be analyzed. AI can interpret all of that information, show possible future outcomes, track current weather patterns, and find solutions to environmental destruction.

One Houston-based company in the energy tech industry, Enovate Upstream, has created a new AI platform that will help digitize the oil and gas sector. Their AI-powered platform looks at data from digital drilling, digital completions, and digital production, to give oil companies real-time production forecasting. Their work will hopefully make their oil production more efficient and reduce their carbon emission output. Since oil drilling and fracking are a major cause for concern around climate change, their work will make a difference in slowing climate change and make their industry as a whole more climate-conscious.

Energy

Energy is an industry rich with data opportunities—and as Houston's energy sector grows, AI has become a core part of their work. Houston's large influence in the energy sector has primed it for AI integration from startups like Adapt2 Solutions Inc. By using AI and machine learning in their software, they hope to help energy companies make strategic predictions on how to serve energy to the public efficiently. Their work has become especially important in the wake of COVID-19 and the resulting changing energy needs.

Another Houston-based company using AI to influence the energy industry is the retail energy startup Evolve Energy. Their AI and machine learning system help customers find better prices on fluctuating renewable resource—helping them save money on electricity and reducing emissions. The positive feedback from the public on their AI model has shown how energy companies are using emerging technologies like AI in a positive way in their communities.

The bottom line

Houston is more primed than most cities to integrate AI and machine learning into every industry. While there are valid concerns as to how much we should lean on technology for necessary daily tasks, it's clear that AI isn't going anywhere. And it's clear that Houston is currently taking the right steps to continue its lead in this emerging AI market.

------

Natasha Ramirez is a Utah-based tech writer.

James Yockey is a co-founder of Landdox, which recently integrated with ThoughtTrace. Courtesy of Landdox

These two Houston software companies are making contracts less cumbersome for oil and gas companies

Team work

The biggest asset of most oil and gas companies is their leasehold: the contracts or deeds that give the company the right to either drill wells and produce oil and gas on someone else's land, or give them title to that land outright. A typical oil and gas company is involved in thousands of these uniquely negotiated leases, and the software to keep these documents organized hasn't been updated in more than a decade, says James Yockey, founder of Houston-based Landdox.

Landdox does just that: provides an organizational framework for companies' contracts and leaseholds. The company recently entered into an integration with Houston-based ThoughtTrace, an artificial intelligence program that can scan and pull out key words and provisions from cumbersome, complicated contracts and leaseholds.

With this integration, companies can use ThoughtTrace to easily identify key provisions of their contracts, and then sync up those provisions with their Landdox account. From there, Landdox will organize those provisions into easy-to-use tools like calendars, reminders and more.

The framework behind the integration
The concept behind Landdox isn't entirely new — there are other software platforms built to organize oil and gas company's assets — but it's the first company in this space that's completely cloud-based, Yockey says.

"Within these oil and gas leases and other contracts are really sticky provisions … if you don't understand them, and you're not managing them, it can cause you to forfeit a huge part of your asset base," Yockey says. "It can be a seven-, eight-, or nine-digit loss."

These contracts and leases can be as long as 70 or 80 pages, Yockey says, and have tricky provisions buried in them. Before the integration with ThoughtTrace, oil and gas companies would still have to manually pour over these contracts and identify key provisions that could then be sent over to Landdox, which would organize the data and documents in an easy-to-use platform. The ThoughtTrace integration removes a time-consuming aspect of the process for oil and gas companies.

"[ThoughtTrace] identifies the most needle moving provisions and obligations and terms that get embedded in these contracts by mineral owners," Yockey says. "It's a real source of leverage for the oil and gas companies. You can feed ThoughtTrace the PDF of the lease and their software will show you were these provisions are buried."

The origin story
Landdox was founded in 2015, and is backed by a small group of angel investors. Yockey says the investors provided a "little backing," and added that Landdox is a "very capital-efficient" software company.

Landdox and ThoughtTrace connected in 2017, when the companies were working with a large, private oil and gas company in Austin. The Austin-based oil and gas company opted to use Landdox and ThoughtTrace in parallel, which inspired the two companies to develop an integrated prototype.

"We built a prototype, but it was clear that there was a bigger opportunity to make this even easier," Yockey says. "To quote the CEO of ThoughtTrace, he called [the integration] an 'easy button.'"

The future of ERP software
Landdox's average customer is a private equity-backed E&P or mineral fund, Yockey says, thought the company also works with closely held, family-owned companies. Recently, though, Landdox has been adding a new kind of company to its client base.

"What's interesting is we're starting to add a new customer persona," Yockey says. "The bigger companies – the publicly traded oil and gas companies –have all kinds of different ERP (Enterprise Resource Planning) software running their business, but leave a lot to be desired in terms of what their team really needs."

At a recent North American Prospect Expo summit, Yockey says that half a dozen large capitalization oil and gas producers invited Landdox to their offices, to discuss potentially supplementing the company's ERP software.

"Instead of trying to be all things to all people, we stay in our lane, but find cool ways to connect with other software (companies)," Yockey says.

Ad Placement 300x100
Ad Placement 300x600

CultureMap Emails are Awesome

7+ can't-miss Houston business and innovation events in July

where to be

Houstonians are transitioning into a new summer month, and the city's business community is mixing in networking and conference events with family vacations and time off. Here's a rundown of what all to throw on your calendar for July when it comes to innovation-related events.

This article will be updated as more business and tech events are announced.

July 10 — Have a Nice Day Market at the Ion

Stop by for a one-of-a-kind vendor market - #HaveANiceDayHTX - taking place at the Ion, Houston's newest urban district and collaborative space that is designed to provide the city a place where entrepreneurial, corporate, and academic communities can come together. Free to attend and free parking onsite.

Have a Nice Day is a creative collective with a goal of celebrating BIPOC makers, creators, and causes.

The event is Sunday, July 10, 4 to 8 pm, at The Ion. Click here to register.

July 12 — One Houston Together Webinar Series

In the first installment of the Partnership's One Houston Together webinar series, we will discuss supplier diversity an often underutilized resource for business. What is it and why is it important? How can supplier diversity have long-term impact on your business, help strengthen your supply chain, and make a positive community impact?

The event is Tuesday, July 12, noon to 1 pm, online. Click here to register.

July 14 — Investor Speaker Series: Both Sides of the Coin

In the next installment of Greentown Labs' Investor Speaker Series, sit down with two Greentown founders and their investors as they talk about their experiences working together before, during, and after an equity investment was made in the company. Attendees will get a behind-the-scenes look at one of the most important relationships in a startup’s journey and what best practices both founders and investors can follow to keep things moving smoothly.

The event is Thursday, July 14, 1 to 2:30 pm, online. Click here to register.

July 15 — SBA Funding Fair

Mark Winchester, the Deputy District Director for the Houston District Office of the U.S. Small Business Administration, will give a short intro of the programs the mentors will discuss. There will be three government guaranteed loan mentors and two to three mentors co-mentoring with remote SBIR experts.

The event is Friday, July 15, 10:30 am to 1 pm, at The Cannon - West Houston. Click here to register.

July 16 — Bots and Bytes: Family STEAM Day

Join the Ion for a hands-on learning experience to learn about tech and robotics and gain insight into the professional skills and concepts needed to excel in a robotics or tech career. This event will be tailored for 9-14-year-olds for a fun STEM experience.

The event is Saturday, July 16, 10 am to 1 pm, at The Ion. Click here to register.

July 19 — How to Start a Startup

You have an idea...now what? Before you start looking for funding, it's important to make sure that your idea is both viable and valuable -- if it doesn't have a sound model and a market willing to pay for it, investors won't be interested anyway.

The event is Tuesday, July 19, 5:30 to 7:30 pm, at The Ion. Click here to register.

July 20 — Perfecting Your Pitch

Join the Ion for their series with DeckLaunch and Fresh Tech Solutionz as they discuss the importance and value of your pitch deck when reaching your target audience.

The event is Wednesday, July 20, 5:30 to 6:30 pm, at The Ion. Click here to register.

July 21 — Transition On Tap: Investor Readiness with Vinson & Elkins LLP

Attorneys from Greentown Labs’ Gigawatt Partner Vinson & Elkins LLP, a leading fund- and company-side advisor for clean energy financing, will present an overview of legal considerations in cleantech investing, geared especially toward early-stage companies and investors. The presentation will cover the types of investors and deals in the cleantech space and also provide background on negotiating valuation, term sheets, and preparing for diligence.

The event is Thursday, July 21, 5 to 7 pm, at Greentown Houston. Click here to register.

July 28 — The Cannon Community 2nd Annual Town Hall Event

Partner of The Cannon, Baker Tilly, has played an integral part in the success of Cannon member companies. Join the Cannon community for The Cannon's 5-year anniversary celebration!

The event is Thursday, July 28, 4 to 7 pm, at The Cannon - West Houston. Click here to register.

Texas-based dating app sponsors 50 female athletes to honor 50 years of Title IX

teaming up

Bumble is causing a buzz once again, this time for collegiate women athletes. Founded by recent Texas Business Hall of Fame inductee Whitney Wolfe Herd, the Austin-based and female-first dating and social networking app this week announced a new sponsorship for 50 collegiate women athletes with NIL (name, image, and likeness) deals in honor of the 50th anniversary of Title IX.

Established in 1972, the federal law prohibits sex-based discrimination in any school or other education program or activity that receives federal money. According to the Women’s Sports Foundation, the number of women in collegiate athletics has increased significantly since Title IX, from 15 percent to 44 percent.

That said, equity continues to lag in many ways, specifically for BIPOC women who make up only 14 percent of college athletes. The findings also share that men have approximately 60,000 more collegiate sports opportunities than women, despite the fact that women make up a larger portion of the collegiate population.

With this in mind, Bumble’s new sponsorship seeks to support “a wealth of overlooked women athletes around the country,” according to the beehive’s official 50for50 program page.

“We're embarking on a yearlong sponsorship of 50 remarkable women, with equal pay amounts across all 50 NIL (name, image, and likeness) contracts,” says the website. “The inaugural class of athletes are a small representation of the talented women around the country who diligently — and often without recognition — put in the work on a daily basis.”

To celebrate the launch of the program, Bumble partnered with motion graphic artist Marlene “Motion Mami” Marmolejos to create a custom video and digital trading cards that each athlete will post on their personal social media announcing their sponsorship.

“These sponsorships are an exciting step in empowering and spotlighting a diverse range of some of the most remarkable collegiate women athletes from across the country. Athletes who work just as hard as their male counterparts, and should be seen and heard,” says Christina Hardy, Bumble’s director of talent and influencer, in a separate release. “In honor of the 50th anniversary of Title IX, we are so proud to stand alongside these women and are looking forward to celebrating their many achievements throughout the year.”

“Partnering with Bumble and announcing this campaign on the anniversary of Title IX is very special,” said Alexis Ellis, a track and field athlete. “I am grateful for the progress that has been made for women in sports, and am proud to be part of Bumble’s ’50for50’ to help continue moving the needle and striving for more. I look forward to standing alongside so many incredible athletes for this campaign throughout the year.”

“I am so grateful to team up with Bumble and stand alongside these incredible athletes on this monumental anniversary,” said Haleigh Bryant a gymnast. “Many women continue to be overlooked in the world of sports, and I am excited to be part of something that celebrates, and shines a light on, the hard work, tenacity, and accomplishments of so many great athletes.”

Last year, the NCAA announced an interim policy that all current and incoming student athletes could profit off their name, image, and likeness, according to the law of the state where the school is located, for the first time in collegiate history.

The 50for50 initiative adds to Bumble’s previous multi-year investments in sports. In 2019, Bumble also launched a multi-year partnership with global esports organization Gen.G to create Team Bumble, the all-women professional esports team.

To see the 50for50 athletes, visit the official landing page.

------

This article originally ran on CultureMap.