Cal Poly Team Presents Data-Driven Reconstruction of African Californios’s Legacy at Digital Humanities Conference
By Emily Slater
Computer science master’s student Anthony Colin Herrera, left, and history lecturer Cameron Jones discovered an unexpected connection during a visit to the Smithsonian's National Museum of American History. While in Virginia to present their research on African Californios at the DH 2024 conference, they were surprised to find an exhibit dedicated to the very community they were there to discuss.
A Cal Poly team has made remarkable strides in uncovering the history of California’s African-descended population from 1769 to 1850, as demonstrated by their recent presentation at the annual conference of the Alliance of Digital Humanities Organizations in Virginia.
Led by history lecturer Cameron Jones and computer science Professor Foaad Khosmood, the interdisciplinary group is using advanced data science to reveal an often-overlooked chapter of California’s history.
“We’re reconstructing a past that was nearly erased,” Jones explained. “It’s about more than just identifying names — it’s about understanding how these communities formed, thrived and contributed to California’s history.”
The project, one of several sponsored by Cal Poly’s Institute for Advanced Technology and Public Policy, has generated family trees, digitized census records and interactive visualizations, allowing researchers and the public to explore the contributions of African Californios.
In 1790, nearly one in five nonnative Californians were of African descent, with large communities in Los Angeles and San Jose. As Spain expanded its military, records from 1814 show that five of the six soldiers stationed in San Luis Obispo were of African ancestry.
These communities played a key role in California’s development, and the final Spanish census in 1821, just before the transition to Mexican control, reaffirmed their continued influence on the region.
To trace the legacy of the African Californios, Jones and Khosmood developed a system to match individuals across historical documents, enabling them to construct detailed family trees. The process was complicated by incomplete records and discrepancies, making it difficult to connect the dots across layers of historical data.
They relied on the Early California Population Project — a digital database of baptism, marriage and burial records from California’s missions — but these documents lacked one crucial detail: race. To fill this gap, they turned to census data, which included some racial information.
Over several months, Jones and his students scanned the census records into .csv files but faced further challenges with different spellings, accented letters and name variations.
To overcome this, the team modified an algorithm used to compare text strings. Spanish names, with variations like “S” and “Z,” required further customization, so they developed a list of letter substitutions specific to colonial Spanish to improve the algorithm’s accuracy.
“Data allows us to piece together details that might otherwise remain fragmented, giving us a more complete and nuanced understanding of the past,” Khosmood said.
A driving force behind the project’s technical advances is Anthony Colin Herrera, a computer science master’s student with a deep connection to his heritage.
“When I learned about the African Californios, I was struck by how little-known their story is,” Colin Herrera said. “My background made this project especially meaningful, and working with real-world data felt like the perfect way to honor that history.”
Fluent in Spanish, Colin Herrera identified “family units” based on shared last names, parent-child relationships and spousal connections. He traced generations and built family trees linking parents and children across multiple datasets.
Their research is publicly accessible through AfricanCalifornios.org, where users can explore findings, browse family trees and view visualizations that illustrate the impact of African-descended individuals in early California.
Colin Herrera will defend his thesis this spring after spending the year refining family trees based on the project’s data. The team’s next step is to use natural language processing tools to analyze a scanned book of colonial-era land grants, extracting details like people, places and plot sizes.
As the team continues their work, Jones reflected on the importance of reclaiming these narratives: “We know a lot about the wealthy, powerful white settlers but much less about the people of color who played vital roles in shaping our state’s history. California’s past is rich with diversity, far beyond what many realize.”