Stanford professor discusses future of visually intelligent machines and human-AI collaboration

In a talk at Lawrence Livermore National Laboratory (LLNL) on May 30, Stanford University professor and AI visionary Fei-Fei Li traced a sweeping arc in visual intelligence and AI — a narrative she hopes ultimately leads to a more human-centered future.
In a talk at Lawrence Livermore National Laboratory (LLNL) on May 30, Stanford University professor and AI visionary Fei-Fei Li traced a sweeping arc in visual intelligence and AI — a narrative she hopes ultimately leads to a more human-centered future.
Part of the Director’s Distinguished Lecturer Series, the Stanford computer scientist went beyond technical milestones from her two-decade career into her philosophy of intelligence, ranging from the evolution of vision in the Cambrian Age to modern-day “robot cousins” that can see and assist humans in everyday menial tasks.
“Visual intelligence is truly one of the most important corners of intelligence — not only is it important for animals and for humans as individuals, it's also important for civilization,” Li told a packed auditorium in the Livermore Computing Center. “This brings the question of ‘how do we build visually intelligent machines?’ This has been my life's passion.”
When machines learn to see: the birth of modern AI
In her introduction, LLNL Director Kim Budil called Li a “true revolutionary” and a “powerhouse in the field of computer vision.” Named as one of TIME magazine’s Most Influential People in AI in 2023, Li earned her undergraduate degree in physics at Princeton University before pivoting to computer vision and machine learning. An eminent computer scientist, Li said she remains a “physicist at heart,” adding that her visit to LLNL felt like “coming home.”
From there, it didn’t take long before the discussion expanded beyond technical boundaries. Li opened her talk with a look back — way back — to the Cambrian explosion, a period more than 500 million years ago, when life on Earth suddenly diversified.
“Once animals could see, it philosophically changed the way of life. Now you have an awareness of self and others, your world and your environment, and suddenly, evolution changed,” Li explained. “The journey of the development of visual intelligence in animals is integrated and intimately connected to the development of intelligence today on planet Earth.”
When Li was a young researcher, her MIT professor, Edward Adelson, showed her a children’s drawing of pandas. One panda stood alone facing away, appearing crestfallen, while two others hugged. Adelson asked Li, “Can we make that happen?”— as in, could humans ever make a machine grasp nuances and context around images that humans can pick up right away?
Li said that question became the “North Star” guiding her life’s work: building machines that don’t just look at the world, but understand it, and eventually, reason about it.
In her role as director of the Stanford Vision Lab, Li led the creation of ImageNet, a massive dataset of over 14 million labeled images spanning nearly 22,000 categories. It took years and a global team of students and volunteers, but the result would forever change AI. In 2012, a deep learning algorithm trained on ImageNet dramatically outperformed others in a global image classification challenge.
“The key thing we learned in that moment is that data was far too underappreciated in the field of AI and machine learning,” Li said. “Because of the ImageNet challenge … many different methods blossomed in computer vision, in solving the problem of object recognition. Now history seems to say that that moment, the combination of three ingredients — neural network algorithms, the ImageNet dataset for big data and GPUs (graphics processing units) — was the birth of modern AI.”
From labeling objects to understanding stories
But for Li, labeling objects wasn’t enough. Objects can have different meanings depending on the situation, and for a machine to truly understand images requires context, relationships and reasoning. So, Li’s AI lab developed systems that map the relationships between objects, called “scene graphs,” and build visual narratives. In 2015, her team was among the first to train a system that could caption photos with full sentences, the first step to computers that could tell stories from pictures.
Much of today’s AI, Li explained, remains trapped on a flat plane, using 2D inputs to build intelligence for a 3D world. She compared the situation to trying to understand reality through shadows on a cave wall, alluding to Plato’s famous Allegory of the Cave.
“By the time we deeply think about what perceptual intelligence is truly about, we have to recognize the world is more volumetric than a flat world,” Li said. “If we only do everything in 2D, we will have lots of problems. We will have weird artifacts; we will have a hard time doing reasoning and we will not generate images or videos with realism because they don’t respect the law of physics and geometry.”
Li presented a familiar illusion: two checkerboard squares that appear different in shade but are actually identical. Human brains get tricked by the illusion because they’ve evolved to understand a 3D world — automatically inferring factors such as light source, geometry and shadows.
“What we really need to do is to recover everything in a 3D world,” Li explained. “That's a problem that fascinates me, and that's a problem that happens both in the digital world as well as in in natural world. If we unlock that problem, we unlock robotics, we unlock [augmented reality and virtual reality], we unlock a lot of interesting applications.”
At World Labs, an AI startup focused on spatial intelligence and generative AI where Li serves as CEO, researchers are building AI that understands space — “spatial intelligence” — allowing systems to perceive depth, navigate environments and interact meaningfully with the physical world.
“I am truly excited about this line of work,” Li said. “These are all really new techniques that we could not dream of even 10 years ago.”
Robot cousins and thought-guided action
Li’s Stanford group is also developing platforms like BEHAVIOR, a large-scale simulator for training household robots to complete thousands of daily tasks — from making tea to tidying up. She also described the use of digital “robot cousins” — robotic learners trained in very similar environments to improve generalization. In another striking example, her team paired brainwave monitoring with robotic control, allowing users to direct tasks using only their thoughts, including cooking an entire Japanese meal.
As the founding co-director of the Stanford Institute for Human-Centered AI, Li’s message was as much about human values as it was about technological ones. As AI becomes more powerful, Li urged the audience to remain grounded in purpose. She emphasized her belief in human-centered AI — systems that enhance human capability rather than replace it.
“One of the biggest societal concerns is human labor. The threat is about, ‘is AI going to take away our agency? Is AI going to take away our jobs?’ And these are important conversations to have,” Li said. “One thing that I care a lot about is replacing the word ‘replace’ and giving it a new word. I think AI is here to ‘augment’ humans. AI is here to augment everyday people. AI is here to augment scientists, artists, kids, students, doctors, nurses and patients. I think that's the conversation we should have.”
In healthcare, Li and her teams have helped develop smart sensors that monitor mobility and hygiene in hospitals and senior centers. In education, she’s advocated for AI literacy. And in public policy, she’s testified before Congress and the United Nations, served on the California Future of Work Commission and helped shape federal efforts like the National AI Research Resource Pilot.
In closing, Li showed a diagram of a ladder, with “Understanding, Reasoning and Generation” as the rungs and “Data and Algorithms” as the rails.
“The past few years we’ve put together this idea of human intelligence in three different ways: understanding, reasoning and generation,” she said. “There's just so many open questions in the space of computer vision and spatial intelligence, as well as robotic learning, that can continue to help augment humans, as well as help us to do scientific discovery.”
Contact

[email protected]
(925) 422-5539
Tags
HPC, Simulation, and Data ScienceAcademic Engagement
Intelligence
Featured Articles


