On Tech & Vision Podcast
The Paradigm Shift in Innovation: Remixing Existing Tech to Advance Accessibility
Subscribe to Podcast
On Tech and Vision Podcast with Dr. Cal Roberts
This podcast is about big ideas on how technology is making life better for people with vision loss.
Technology developed specifically for people with low vision has historically been bespoke, specific, and expensive, requiring the creation of new technology for a relatively small population of users. But recently we’re seeing a paradigm shift from building new vision tech from scratch to building on top of existing technology that most people use daily.
In this episode, Dr. Cal talks with Karthik Mahadevan, the founder and CEO of Envision, about how his technology company builds on existing tech, like smartphones and smart glasses, to benefit users with low vision. Envision’s software uses AI to recognize important visual cues in a user’s environment and then relays that information to the user with computer-generated speech. Karthik’s breakthrough realization early on that he didn’t need to build hardware when he could just tap into the cameras of smartphones and smart glasses that people already owned, enabling Envision to reach a much wider audience.
The episode also features an interview with Troy Otillio, the CEO of Aira. Aira’s platform links users to a human visual interpreter, who can access the camera on their device and guide the user through the task, whether reading a letter, navigating a grocery store, or finding their way around an unfamiliar city.
Ultimately, the future of creating vision technology might not be in reinventing the wheel but in improving it and figuring out how to turn the tech people already use every day into easy and accessible solutions for people with low vision.
Podcast Transcription
Peters: I usually start out with my kick drum. It’s always the hardest to to get it right for some reason. Seems so simple, but it’s the most annoying process. And then I chop it up into a triplet at the very end. And then on top of that you have what’s called a reverse kick. You’ll actually take your kick wave and reverse it. And then once you have that perfected the rest is kind of simple. You’re really just layering different aspects of percussion – it could be more toms that you might tweak in a little way to make it some sort of like a bouncing rhythm. And you might have like high hats. Might be open, might be closed, high hats to give it even more movement throughout the percussion section. Kind of like bouncing back and forth because if something is just bang bang, bang it kind of it gets boring really quick.
And then once I have my percussion going, I will usually try to start out with some sort of atmospheric element, maybe like a clap that has an echo to it, just to kind of lead into the track like, OK more things are happening. And then I’ll start out with my first lead and then throughout that 30 minute to one hour mark, it’s going to start ramping up. Maybe not in speed, but maybe more stuff in the background noises that you’re mixing into your DJ set from other tracks that are going to start adding a little more energy to it. Eventually toward the end, it’s going to start peaking like the highest of my energy.
Roberts: Jimmy Peters creates techno music by starting with a simple beat from a drum machine and adding increasingly complex and specialized layers of sound on top. The final result is greater than the sum of its parts, creating music that can make you want to dance for hours and fully immerse yourself in the sounds he’s crafted. And now vision tech designers and entrepreneurs are innovating in similar ways. Instead of starting from scratch, they build upon existing networks and technology to advance and evolve accessible tech.
I’m Doctor Cal Roberts. This is On Tech & Vision and today’s big idea is the emergence of a new paradigm in the development of vision technology. One that is driven by a shift towards building on existing consumer technologies and fostering closer partnerships between developers. Let’s take a look at one example.
So, today my guest is Karthik Mahadevan and he is the President and CEO of Envision. Our listeners always love to know more about our guests. So tell us about yourself.
Mahadevan: Sure. My name is Karthik Mahadevan. I’m the co-founder of Envision. Envision is a company that we started about six years ago. It all started when I was a student here in The Netherlands. I was studying industrial design. And I happened to look for a topic that I need to pick as a thesis of mine. As you do towards the last years of a master’s education.
And I was at that time invited to a school for the blind in India to come and give a talk to the students there about what are the job opportunities that they can in the future. And I was just there talking to the students, explaining what it means to be a designer, because that’s what I was studying to be. And I remember just asking all the kids in the room that, you know, a designer is just somebody who solves a problem. If you’re able to design a solution to a problem that you face, all of you could be a designer in the future.
And I remember towards the end of my talk, I asked them a question. I said, hey, if all of you were to become a designer tomorrow, what would be the problem that all of you would like to solve? And I remember all the kids in the room that day were like, I want to be more independent. I want to be able to go out and hang out with your friends and be able to pick up a book and I read a book by myself. And this independence was like this strong emotion that they all wanted to experience, and that whole incident is what stuck with me.
So I remember that I came back to the university. I spoke to a professor of mine and said, this is something that I want to do as a thesis. This is something that I want to, you know, pick up and research a bit more on. And that is all it was in the early stages. It was purely a research endeavor for me to speak to as many of the blind and low vision people in The Netherlands and clearly understand what is independence for them. What would they mean when they say the word independence?
And as I started speaking to them, I understood that for a lot of them, independence almost always meant access to information. And because of the fact that so much of the information around us, because it happens to be in a visual form, their inability to access that is the thing that is causing dependencies in their life. And yeah, that’s when I started to understand that the way to approach this problem is by figuring out how do you make the information around you accessible? Do you figure out different ways of making this information accessible without having to change the environment itself?
I understood the impracticality of wanting to, you know, change the infrastructure. You cannot go around putting Braille stickers on everything around you. That’s when I started to take a deep dive into technologies of the day, like artificial intelligence, image recognition and understanding how they can be used by understanding the environment and having that be spoken out. And yeah, that’s sort of how the whole idea of Envision originated. And six years down the line, still very, very excited about the possibilities that it unfolds.
Roberts: People who are sighted get information differently than people without sight. That’s where Envision comes in. Their software taps into the camera on a smartphone or smart glasses, and uses AI to interpret the visual cues in the user’s environment. It then relays that information to the user with computer generated speech. Karthik explains how that might work in practice.
Mahadevan: So let’s take an example, right. So let’s say you walk into a train station. Now, if I am a sighted person, I walk in. I see the information that I’m looking for instantly because it’s on a big display out in front of me, right? The issue is that that information is only being communicated in a visual way because it’s a display and it’s being communicated only to people who can see that information. And if you happen to be somebody who is, you’re blind, or you’re somebody who happens to have a vision impairment, you no longer have access to that information easily. You have to ask somebody else or you have to open up an app and look for it. And I think that’s where the problem is, where this information system or these infrastructures are not built with complete accessibility in mind. And that’s the problem that Envision is attempting to solve.
Roberts: So taking that same example. The train station has information and they choose to show it visually. How else could that information be shared?
Mahadevan: I don’t think that can be much else done just from a practicality standpoint, right? Like the way these infrastructures and these institutions are just built. I don’t think that, like the stations can do that much better. I think you can have audio announcements. You can have all of these other things that can complement visual information. But those are often very good ideas in your theory, but in a practice they don’t really work as efficiently as we would hope for. That’s why there is a need for tools like what Envision is building, which can sort of help you understand this information which otherwise would not be accessible.
Roberts: So for many years, consumer electronics was evolving in one direction. And assistive technology was evolving in a different direction. But what I noticed recently is this merger of the two. That consumer electronics are becoming more accessible. And developers like Envision, instead of starting from scratch with de novo technology is borrowing the consumer electronics that’s already in development. You’re kind of a pioneer in that area. Explain.
Mahadevan: Yeah, exactly. I think when we started doing a deep dive into this problem of, you know, how do you make information accessible, we didn’t want to sort of come up with very complex systems and complex tools, you know, very specialized solutions. We wanted something that people can just start to use very quickly, right? And one of the devices that has become a common place for us is a smart phone.
Everybody now has access to a smartphone, have access to a camera, you’re on a smartphone. And they’ve all been designed to be fairly accessible and that’s why there has been this a large scale adoption of the smartphones amongst everybody in the world. When we started to investigate, we can be like, hey, everybody already has a smartphone. Let’s make a bet on the fact that the cameras on these devices are going to improve only from this point on every year, the processing capabilities on these devices are going to improve every year. And the kind of AI algorithms that can be built on these new processors is going to improve every year.
Those were the three technological sort of bets that we need, and we came to the conclusion that we don’t have to build some sort of specialized hardware or specialized assistive technology. We just need to be able to build something that sits on top of these existing consumer platforms and something that can direct leverage all these technological advancements that anyway is going to happen and a build a solution that will actually be helpful for the people who need it the most.
Roberts: As Karthik predicts, the computing technology of our smartphones is likely to continue to improve. Will there also be a leak from computers we carry to computers we wear?
Mahadevan: Throughout history how computing has shrunk in size and it has all happened in the service of giving you a faster access to information. That’s basically been the thing that we’ve all collectively been optimizing for. Earlier you had to own a bigger facility to have a computer. You know, you cannot just have a computer at home. You have to have a university or something big where you have to go and access a computer which will have your servers and what not.
Then came personal computers where you can just have a computer at your home. You can connect it to a dial up Internet and just access information when you’re at home, right? That sort of made access to information slightly occur faster than what it was earlier.
And then came the laptops. You know, now you can have your personal computer on the go. So no matter where you are, you are in a train, in an airport, in a cafe. You can always open up your laptop and just access information and do your work when you’re on the move.
Then came the smartphones, you know, which sort of shrunk the laptop and put it in your pocket. Now you have a very strong computer always with you, but that fits in your pocket. Like the processors that are on your iPhone today are superior to the processors that were on the Apollo mission. So that’s basically the kind of computing power that we are walking around with. And when you extrapolate from that point and you ask what’s next, right? What’s the next level of shrinkage of technology that’s going to happen? Our smart glasses are emerging to be the next sort of your frontier where I believe there will be a lot more available in the market as well. People are making all sorts things. Pins and necklaces and headbands and whatnot.
But, what Envision has realized having worked with smart glasses for a while is that it’s just a good a form factor to have something like a smart glass on your face because you know, like your head, it sort of tends to be a very good pointing unit in your body. It has the exact amount of angles. It’s very easy for you to rotate your head and sort of move it in different angles to look at specific things that you want to and you can sort of use different input mediums. You can use your eyes. You can use your fingers. You can use your voice to interact with these glasses. Because of those, I think just smart glasses are going to be one of the winning wearable options out there.
Roberts: So one of the components of smart glasses is the camera or often two cameras. Explain the positioning of the camera in smart glasses.
Mahadevan: So, a camera is your what makes the smart glasses the most powerful because I think the camera is one of the strongest sensors that we have developed, right? There is all kinds of sensors that we’ve been fooling around with, but what we have found out over the years is that there is no better sensor than a camera because that’s the sense that humans use.
We don’t have a LIDAR or IR and all of that senses, we just use our eyes to know about the depth of things. Like the description . So, having a camera is probably the only sensor that smart glasses would need, and most smart glasses just go with one camera and that’s as of now mainly to do with the weight and the processing capabilities that are still not 100% there. But I think eventually we will probably evolve to having at least two cameras on these smart glasses, because then that gives you a bifocal view of the world. And once you have two cameras, you can a lot more easily start to estimate distances and to detect obstacles and things like that when you have a bifocal view of the world than a singular focus.
Roberts: Another important component of smart glasses is the microphone.
Mahadevan: I think the microphone on the smart glasses is pretty amazing. So here is an interesting bit of a trivia that I only learned a month back. So, a month back I was in Seattle and I met with Babak,who was one of the original inventors of the Google Glass. This is the Google Glass that came out a decade. And it was all the rave because it was just so refreshing.
And what he was telling me was that they essentially had to invent speech to text for the Google Glass. Like before that speech to text, the way it is today just did not exist because when you have a computer or a laptop or a phone, you can just input text with the keyboard. There was never really a need for having a very good speech to text system. But when they started building the Google Glass within the X Development “moonshot” space of Google, there is when they actually started very heavily searching and investing in development of speech to text technology, because suddenly when you start thinking about a smart glass where you don’t really have your inputs like the keyboard, you need to have a really accurate and really good speech to text. So that’s the reason speech detect technology is so good is because the folks at Google started to think about it all those years back for a smart glass.
Roberts: Google Glass was one of the first commercially available pieces of wearable computer when it debuted in April of 2013. It struggled to find commercial applications and development stopped after just two years. But the technology platform these wearable computers opened up was ripe for a remix.
Mahadevan: Two things that didn’t work for smart glasses: one of them was the display on these are horrible. I think all the smart glasses in the market, and I think I played around with all of them, there’s nobody who has yet built a display that is, you know, that is, that is unobtrusive, right? Every time you watch a concept, a video of these smart glasses, you know they show you almost like Iron Man kind of concept where you have these floating VR elements within your eyesight and it just looks very, very good within the within a concept of videos.
But the minute you put on an AR smart glasses with a display on it, it immediately feels very off, right? It doesn’t really fit as well with your environment. You step out into the sunlight and it just stops to work. So, people haven’t been able to craft a good enough of a display on a smart glass that can allow people to both look at the world, but also overlay a graphic element on top of what they are viewing into the world. So, I think having a bad VR display has been the only option blocker that has been stopping smart glasses from being adapted in a wide way.
But, the one thing that has happened in the recent past is the other problem, which is lack of good applications of smart glasses, right? And the thing that has changed in the recent years on that front is conversation AI. I think AI, the way it has developed over the past couple of years, that is the key factor, which is now changed, which allows for these smart glasses to actually now start having useful and a meaningful applications.
Especially the kind of image recognition that the AI of today is capable of and the kind of things that you can do off those images, that’s what has now unlocked a paradigm for these smart glasses, where even if it’s not able to have good display, just by having very, very intelligent and intuitive AI on these glasses, it unlocks a lot of applications for all of us through your develop and explore on.
Roberts: Karthik often states that he doesn’t make hardware. Envision develops software and working with existing smartphones and later smart glasses has helped the application reach a broader audience. A small market company like Envision doesn’t have the capacity to make those things themselves, so they build solutions to meet their users where they already are and. they’re joining forces with other tech developers.
Otillio: I ran into Karthik at Seaside. We met at Seaside and he came out and he was like, hey, we’re doing this thing with Google Glass and well, one, we’d love to put Aira on there. And I’m like, I’ve been waiting for you. This is an awesome day.
Roberts: Troy Otilio is the CEO of Aira, a service that connects users with a visual interpretation agent through an app. The app enables the interpreter to access the camera on the user’s smart phone or glasses. From there, the interpreter can assist the user in everything, from reading a letter to navigating an unfamiliar city, to shopping for groceries. Aira recently partnered with Envision, weaving another layer to, as Troy calls, the fabric of technology for people with vision impairments.
Otillio: From that we actually invested the time to do a commercial integration. And you know, support each other. Some people call in sometimes to Aira support to get Envision support, you know? But but we are kind of very closely connected in that way and we have plenty of happy customers that, you know, use Aira on Envision. Use the AI from Envision. Use the AI from Aira. You know, like it’s it’s a combined solution that I think works really well.
Roberts: It’s a truism that there’s no such thing as a new idea. With a different combinations of existing ideas are so infinite that we may never run out of them. This has been referred to as the Taco Bell phenomenon because despite only using a finite number of different ingredients that particular fast food chain is able to create an enormous menu by endlessly combining and recombining the same raw materials. As Troy explains, there are countless unthought of ways to combine the tech we already have and discover novel uses for things that already exist.
Otillio: A lot of new products coming out, they’re awesome. Great ideas. But there’s always that concern from the end user when the automated thing doesn’t work. What am I going to do? Think of a navigation experience, right? TA fear I think anybody would have would have I’ve navigated some place using some technology and it got me there. I might not have gone there on my own, but if the automated technology, AI technology for whatever reason, doesn’t work, what am I going to do?
And so often you can look at Aira as just an a layer, an accessibility layer that exists for when an automated solution, a pure software solution doesn’t work. Or when there is no solution, right? Like that. Calling a backstop. Call it a more general purposed layer that you can always fall back. And so there’s companies that often come to us and go, hey, could we integrate such that if and when a user needed to, they could fall back to the Aira solution and could we integrate so that the agent would already know what the task at hand is, or have the information right separately? Could the agent sometimes help facilitate the strategic effort of whatever you’re doing? And navigation is a good one to think about.
There is that question of, okay I do want to, well, pick Walmart. Walmart’s on the brain, but I want to get to Walmart, I’m going to need this. I haven’t gone before, but I’m going to need to go down the block, catch a bus, transfer, maybe take a subway. It may be complicated. I have all these things to figure out. I can figure it out on my own. It’s going to take me time, but what if an agent could help? Like with that executive planning, we can get on a call, share a map and tell you what I prefer to do. Blah blah. And then the route is set up and sent to whatever this thing is, a named thing that that’s going to help me get there.
So, there’s a lot of ways I think this is that fusion of human intelligence, artificial intelligence, new devices, and how do you create a kind of a fabric. Look, I have so many apps on my phone. You know, I use all kinds of apps. But I have to pop between each of them and I lose the context. I have to reprogram what my task is over and over and over. And it’s it’s fatiguing and it’s not efficient. What if there’s a fabric to kind of connect who I am and what I want to do across apps? I think that’s a great vision. I think we can participate in that and that’s kind of a little bit of a scenario I was describing just previously.
So, I think there’s a lot to unlock in this space, by working together because some people adopt and pioneer things before others.
Roberts: And as consumer technology improves, so does the tech for users who have low vision. Innovations like 5G and large language models or helping companies like Aira and Envision make the user experience even more seamless.
Mahadevan: That is exactly the future that I envision going forward is that your allies will actually have integrations with all of these different services that you like to use. So we already have options, for example to integrate this to your e-mail, to your WhatsApp, to your calendar. But in a similar way, if these technologies evolve the way they are doing, they will be able to integrate with a Walmart or with Target that is near you. And once you are in that location, you should be able to access information. Like exactly what is the location of an item in a supermarket? Right?
We are playing around with a new sort of kind of technology called spatial audio, which is being very fascinating and we’ve sort of built a prototype of that where you can put audio or beacons within a supermarket and if you enter the supermarket, you can be like, hey, I’m looking for apples. It starts to play like a pulsating audio from exactly where the apples are in the supermarket.
So, all of that is the future that we are heading towards. So, this is sort of like the base layer where we’re building this intelligent AI which understands your intent very, very well and offers information. But in the future, these agents, this AI are going to interact with other AI. I think that’s where it gets really, really exciting, you know. Like a Walmart will have their own AI. Like a Target will have that own AI and your AI can speak to that AI and ask for specific information, and that’s when it becomes super, super exciting. It’s very much doable within the foreseeable future for all of us.
Roberts: Progress doesn’t come from reinventing the wheel, but by improving it, as we learned in this episode, building on existing solutions can actually propel technological innovations more efficiently. And to a wider audience with ground breaking advancements such as bifocals, smart glasses and intercommunicating AI currently in development, tomorrow’s innovators can build on an ever expanding array of cutting edge technologies to drive progress.
Karthik and Troy didn’t invent wearable computers, just like early techno musicians didn’t invent the drum machine, but they’re using that existing tech in such creative and novel ways that it might as well be entirely new. They’re making the world more accessible, navigable and engaging, and whether you’re part of the low vision community or not, that’s a beat we can all dance to.
Did this episode spark ideas for you? Let us know at podcasts@lighthouseguild.org and if you liked this episode, please subscribe, rate and review us on Apple Podcasts or wherever you get your podcasts.
I’m Doctor Cal Roberts. On Tech and Vision is produced by Lighthouse Guild. For more information visit www.lighthouseguild.org on tech and vision with Doctor Cal Roberts produced at Lighthouse Guild by my colleagues Jane Schmidt and Anne Marie O’hearn. My thanks to Podfly for their production support.
Join our Mission
Lighthouse Guild is dedicated to providing exceptional services that inspire people who are visually impaired to attain their goals.
