StarApple AI | Dr. Shirley Budall | November 24, 2025
The Missing Data Problem: Gender Data Gaps and AI Policy Failure in the Caribbean
Before we can build AI systems that serve Caribbean women, we must confront the uncomfortable fact that our data collection systems were never designed to see them clearly.
Every AI system reflects the data it was trained on. It learns patterns, makes predictions, and delivers recommendations based on what it has seen in that training data. If the data is incomplete, skewed, or biased in how it represents a population, the system's outputs will be incomplete, skewed, or biased for that population. This is not a controversial technical claim. It is the founding insight of every major work on algorithmic fairness published in the past decade.
My hypothesis in this article is specific: Caribbean AI systems trained on existing datasets will systematically underserve women because the data they depend on was collected within institutional systems that undercounted, misclassified, or simply ignored women's economic activity, health experiences, and social participation. This is not an accident of oversight; it is the predictable consequence of data collection systems designed in eras and institutions that treated women's activities as secondary or invisible.
The consequences are not abstract. They will manifest as AI health tools that make more errors for Caribbean women, credit scoring systems that deny women loans their financial behaviour justifies, social protection AI that fails to identify women in need, and agricultural advisory systems that overlook female farmers. The policy response required is not a diversity statement; it is a structural reform of how Caribbean governments collect, manage, and govern data.
Understanding the Gender Data Gap: What It Is and Why It Matters
The concept of the gender data gap was brought to wide public attention by Caroline Criado Perez in her 2019 book "Invisible Women: Exposing Data Bias in a World Designed for Men", but the underlying problem has been documented in academic literature for much longer. The gap arises when data collection systems treat "male" as the default human and fail to collect, analyse, or report data that specifically captures women's experiences.
In practice, the gap manifests in multiple ways. Economic data has historically focused on formal employment, formal business registration, and formal financial transactions, all of which underrepresent women's economic activity. Health data has been shaped by clinical research that historically excluded women from drug trials and under-invested in conditions specific to women. Social participation data often focuses on formal political institutions rather than the community and care structures where women's participation is concentrated.
In the Caribbean, these global patterns are compounded by specific regional characteristics. The informal economy is a large part of Caribbean economic life, and women are disproportionately represented within it. The Statistical Institute of Jamaica (STATIN) and its equivalents across CARICOM collect labour force data that captures formal employment more reliably than informal activity. This means that women's actual economic contribution to Caribbean societies, which includes market trading, domestic work, small-scale food production, and community care provision, is systematically understated in the datasets that AI systems will be trained on.
The Informal Economy and Its Data Shadow
I want to dwell on the informal economy issue because it is particularly acute in the Caribbean and particularly underappreciated in AI policy discussions. Consider what happens when an AI credit scoring system is trained on data from a Caribbean financial institution. The system will learn to associate creditworthiness with features it can measure: salary records, formal tax contributions, documented business transactions, credit history within the formal banking system, and social insurance payments.
A woman who sells produce at Coronation Market in Kingston every Saturday morning, who has done so successfully for fifteen years, who manages her household finances with considerable skill, and who has never defaulted on an informal credit arrangement, has almost none of these measurable features. She does not appear in formal payroll records. Her Saturday income does not appear in tax records unless she is among the very few informal vendors who file. Her business transactions are cash-based. Her credit history within the formal banking system is thin or absent.
An AI credit scoring system will classify her as a poor credit risk, not because she is, but because the data available about her does not capture the economic reality of her life. The system does not fail because it is malicious; it fails because it is looking in the wrong places for evidence of financial capability, in the places where women's economic activity is systematically underrepresented.
The ILO's decent work agenda and its ongoing work on the informal economy across CARICOM labour markets provides a rich body of evidence on this structural characteristic of Caribbean economic life. This evidence base should be the starting point for any Caribbean government designing AI systems for financial services, social protection, or labour market analysis.
Health Data Gaps and Their Consequences
Health AI is advancing faster than health data governance. Diagnostic tools, triage systems, and clinical decision support tools are entering Caribbean health systems, and the datasets used to train and validate them present specific problems for Caribbean women.
The most widely documented global health data gap affecting women is the historical underrepresentation of women in clinical drug trials. This is relevant to Caribbean AI health tools because many of the clinical databases used to train AI diagnostic systems draw from the same clinical research environment that excluded women. An AI diagnostic tool trained primarily on data from male patients will have lower diagnostic accuracy for equivalent conditions in female patients. This is a documented phenomenon, not a theoretical risk.
Caribbean-specific health data gaps add additional dimensions. Non-communicable disease data across CARICOM is often not fully disaggregated by sex, age, and socioeconomic status simultaneously, limiting the ability to understand how these variables interact to produce differential health outcomes. Mental health data for Caribbean women is particularly sparse: depression, anxiety, and the mental health consequences of gender-based violence are under-reported and under-recorded in clinical systems because of stigma, limited mental health service capacity, and inadequate screening in primary care settings.
UNESCO's monitoring of AI ethics implementation, including gender provisions in educational and research AI, has noted that health AI is among the highest-risk areas for gender bias. Caribbean health ministries deploying AI tools should treat the absence of Caribbean-specific validation data as a presumption against deployment rather than a minor caveat to be noted in documentation.
Digital Data Gaps: The GSMA Evidence
The GSMA Mobile Gender Gap Report 2025 confirms that women in low- and middle-income countries continue to lag behind men in mobile internet use, with the gap persisting even as overall connectivity increases. This has a direct consequence for AI systems that rely on digital behavioural data: the digital data that AI systems use for training, validation, and personalisation systematically underrepresents the online behaviour of women who are less connected.
This matters across a wide range of AI applications. A personalised learning AI in Caribbean schools will have weaker models for female students who have less device access at home. A digital health information tool will have less reliable usage pattern data for women who access it less frequently due to data cost constraints. A financial AI tool that uses mobile money transaction history to assess creditworthiness will have thinner data for women who use mobile money less frequently.
The GSMA data is particularly useful because it disaggregates the mobile gender gap by country income level, identifying the specific barriers driving the gap in different contexts. For Caribbean policymakers, this provides an evidence base for targeted interventions: if affordability is the primary barrier in a given market, the policy response is different from if safety concerns or relevance of content are primary. AI systems should not be deployed as if these gaps do not exist.
The Policy Governance Failure
Gender data gaps in the Caribbean are not primarily a technical problem; they are a governance problem. Governments have the authority to require that data collection systems be designed to capture women's economic activity, health experiences, and social participation. They have the authority to require that AI systems be validated on representative data before deployment. They have the authority to require gender-disaggregated reporting from AI systems in public service contexts. Most Caribbean governments have not exercised these authorities in relation to AI.
The Jamaica Data Protection Act 2020, administered by the Office of the Information Commissioner, establishes principles of data accuracy and purpose limitation that are directly relevant to gender data quality in AI training. A dataset used to train an AI system that is systematically inaccurate in its representation of women's economic activities could arguably fail the accuracy principle of the Act. But this argument has not been tested, and the Office has not yet published guidance on AI training data quality from a data protection perspective.
The EU AI Act's high-risk AI provisions, which are now operational for an increasing range of system types following the August 2024 entry into force, include explicit data governance requirements for training datasets, including requirements to address known biases. The EU framework therefore already treats training data quality as a compliance requirement rather than a design preference. Caribbean AI governance frameworks should incorporate equivalent requirements.
ISO/IEC 42001, published in December 2023, provides an international standard for AI management systems that includes risk assessment provisions covering data quality and representational adequacy. Caribbean organisations adopting this standard should explicitly extend its risk assessment procedures to cover gender data representational adequacy as a defined risk category.
What Good Looks Like: Closing the Gaps
Closing the gender data gaps that will distort Caribbean AI systems requires action at three levels: data collection reform, AI governance requirements, and international cooperation.
At the data collection level, the most important change is to redesign national statistical surveys to capture informal economic activity in a form that is useful for AI training validation. STATIN's Labour Force Survey and Survey of Living Conditions should be expanded to capture the full range of women's economic activity, including informal income, unpaid care work, and community service, using methodology developed in consultation with women's organisations that know where this activity occurs and how it is organised.
Health data collection should be reformed to systematically disaggregate data by sex, age, socioeconomic status, and geography, and to incorporate validated screening tools for conditions that disproportionately affect women in Caribbean contexts. The Caribbean Public Health Agency (CARPHA) should lead a regional health data standards initiative that specifies minimum disaggregation requirements for health AI training data.
At the AI governance level, public sector AI procurement should require vendors to demonstrate that their training data adequately represents the populations the system will serve, including Caribbean women. This requirement should be standard in government procurement documentation, enforced by the Office of the Information Commissioner or an equivalent designated authority.
Recommendations
- Commission a Caribbean Gender Data Audit within 12 months. CARICOM governments should jointly commission an audit of the major datasets used in Caribbean AI applications across financial services, health, education, agriculture, and social protection. The audit should assess the adequacy of representation of women, particularly women in informal employment and rural areas, and produce a ranked list of gaps requiring urgent remediation. UN Women Caribbean and the Caribbean Development Bank should be invited to co-fund this work.
- Expand STATIN's mandate and capacity to collect gender-disaggregated informal economy data. The Statistical Institute of Jamaica should be given explicit statutory responsibility for collecting data on informal economic activity disaggregated by gender, with dedicated funding for survey methodology development. The results should be published in machine-readable format suitable for use in AI training and validation, with equivalent actions required of national statistics offices across CARICOM.
- Establish minimum data representational adequacy standards for public sector AI procurement. Jamaica's Office of the Information Commissioner, in consultation with sectoral regulators, should publish minimum standards for the representational adequacy of training data used in AI systems procured by government. Systems unable to demonstrate that their training data adequately represents Jamaican women, including women in informal employment and rural areas, should not be approved for public sector deployment.
- Require CARPHA to lead a regional health data standards initiative. The Caribbean Public Health Agency should develop and publish minimum standards for sex and gender disaggregation in health data used to train or validate AI systems deployed in Caribbean health settings. These standards should align with WHO guidance on sex and gender analysis in health research, and compliance with them should be a condition of regulatory approval for health AI in CARICOM member states.
- Fund partnerships between Caribbean national statistics offices and mobile network operators. Mobile network operators hold extensive data on women's digital access and usage patterns. Under appropriately designed privacy-protective frameworks, this data could supplement national statistical collection and provide more accurate training data for AI systems serving Caribbean women. The telecommunications regulatory authorities across CARICOM should develop model data-sharing frameworks enabling this cooperation.
- Incorporate gender data adequacy into Jamaica Data Protection Act enforcement. The Office of the Information Commissioner should publish guidance clarifying that the accuracy principle of the Jamaica Data Protection Act 2020 applies to AI training datasets, and that datasets used to train AI systems affecting Jamaican citizens should be assessed for gender representational adequacy as part of data protection compliance reviews. A compliance checklist for AI developers should accompany this guidance.
Conclusion
The gender data gaps that pervade Caribbean statistical systems are not passive omissions. They are active sources of harm, producing policy decisions, resource allocations, and now AI systems that systematically underserve women. When these gaps migrate into AI training data, they become embedded in systems that operate at scale, in real time, making consequential decisions about who receives credit, who receives health diagnoses, who receives social protection, and who receives opportunity.
The governance response must match the scale of the problem. Individual bias audits of individual AI systems are necessary but insufficient. What is required is a systematic reform of how Caribbean governments collect data, what they require of AI developers in terms of data quality, and how they hold those developers accountable when their systems fail to serve women adequately.
You cannot build AI systems that serve Caribbean women from datasets that were never designed to see them. Closing the gender data gap is not a peripheral equity concern. It is a technical precondition for any Caribbean AI strategy that claims to serve the whole population rather than only those whose lives have always been visible in the data. The next strategy document that skips this step is choosing who the system will fail.
Frequently Asked Questions
What is a gender data gap?
A gender data gap is the absence or inadequacy of data about women's lives, experiences, and activities in datasets used to make decisions. Gender data gaps exist because historical data collection systems were designed to measure the activities and experiences that were considered economically and socially important, and these systems consistently undervalued or ignored activities predominantly performed by women, including unpaid care work, informal trade, and community health provision. When these datasets become AI training material, the gaps become systematic AI failures.
How does the informal economy create gender data gaps in the Caribbean?
Caribbean women are disproportionately represented in the informal economy, particularly in market trading, domestic work, and small-scale agriculture. Informal economic activity is by definition absent from formal business registration systems, tax records, and social insurance databases. These formal databases are increasingly used to train AI systems for credit scoring, social protection delivery, and labour market analysis. Women whose economic lives are conducted informally are invisible to these systems, producing AI outputs that systematically exclude their circumstances from analysis and service delivery.
What health data gaps affect Caribbean women?
Caribbean women face specific health data gaps in several areas: reproductive health conditions are underrepresented in clinical datasets because gynaecological complaints have historically received less research investment than conditions affecting both sexes; chronic disease data often lacks disaggregation by sex, age, and socioeconomic status; mental health data is particularly sparse for Caribbean women facing intersecting stressors of poverty, gender-based violence, and single-parent responsibility. AI health tools trained on these datasets will perform worse for Caribbean women across all three areas.
What is the Statistical Institute of Jamaica doing about gender data gaps?
The Statistical Institute of Jamaica (STATIN) conducts regular household and labour force surveys that provide some gender-disaggregated data. However, STATIN's surveys do not yet systematically capture the full range of women's economic activity in the informal sector, do not track digital access and AI interaction disaggregated by gender, and have not been updated to collect the types of data needed to audit AI systems for gender equity. Expanding STATIN's mandate and capacity to address these gaps is a specific policy intervention that can be acted on now.
What international standards address gender data in AI?
ISO/IEC 42001, published December 2023, includes risk management provisions that encompass data quality and representational adequacy. The EU AI Act's high-risk AI requirements include obligations on data governance, including requirements that training datasets be representative and free from biases that could lead to prohibited discrimination. UNESCO's monitoring of AI ethics implementation addresses gender data representation in educational AI. These standards collectively establish that gender data adequacy is a technical requirement, not merely a political preference, for compliant AI systems.
How can Caribbean governments fund gender data collection without large new budgets?
Several mechanisms exist for expanding gender data collection without large new budget allocations. Administrative data held by government ministries, including health, education, and social protection, can be analysed and published in gender-disaggregated form with relatively modest investment in data management capacity. Coordination with private sector entities, particularly mobile network operators and financial institutions that hold data on women's digital and financial activities, can provide supplementary data under appropriate privacy frameworks. International technical assistance from ITU, UN Women, and the World Bank's Gender Data Portal programme is available for national statistical offices willing to prioritise this work.
About the Author
Dr. Shirley Budall is a Caribbean expert in gender, inclusion, and AI governance with demonstrated experience in the ethical, legal, social and governance dimensions of artificial intelligence and digital technologies. She conducts legal and regulatory framework reviews and develops policy recommendations for legal reform in AI governance, data protection, human rights, and gender equality. Dr. Budall has knowledge of international and regional AI governance standards and has advised Caribbean government institutions and regional organisations on inclusive AI policy. She is a researcher and consultant working across the CARICOM region on digital economy governance, women's rights in the digital age, and equitable technology development. Contact: insights@starapple.ai