I’m interested in Artificial Intelligence and Robotics. With my recent success in gluing together a natural speech recognizer with a natural speech synthesizer, I’ve become more confident that my goals of creating an artificial humanoid-like robot are not necessarily just castles in the cloud.
I’ve been very slowly working on a project that I like to call Primus. In short, Primus is hoped to be an Artificial Intelligence Server. This server would address many things present in the world of Artificial Intelligence. At the same time, it is planned that this server would also interface to robotic systems, which themselves would offer high-level abstractions for various physical tasks, such as perception and motion.
As I’ve written before, Primus is a collection of libraries/components that are expected to work together to form a general artificial intelligence server/runtime environment. Deployment and application propositions include mobile robotics, information processing, pattern recognition, and social interaction simulations. The components are language-specific, but the framework itself is not language-specific. The framework is meant to include support for other intelligent agent software interfacing such as common computer vision libraries, simulation environments, natural language processors, external neural networks, and so on. This framework is meant to run in an environment that can support each component independently where each component communicates or interfaces both internally and externally through messaging mechanisms such as sockets, serial I/O, HTTP, TCP/IP, RESTful APIs, XML-RPC, etc.
About two years ago I dreamed up building a ‘robot teddy bear’. The idea is that it could provide companionship as a consumer toy, but also be used in more clinical settings as a proxy therapist for mental health treatment. I imagined the teddy bear to be very advanced, as is want to do in my imagination. This means that it could listen to a speech, comprehend the speech, interpret visual stimuli, receive tactile stimuli, move its body, make facial expressions, walk, and generate natural language.
That’s very ambitious.
I kind of got into it on a very abstract level, did my usual daydreaming about it, and then kind of moved on as I thought, ah well, it’s too complicated. This is something I tend to do often.
My interest in the project was re-spiked by my discovery of this in Barnes & Noble one day:
Yes. That is a robotic teddy bear. Albeit a very simple one, but a consumer end product capable of different functions nonetheless. It even has updating content so it remains fresh and new as time goes on.
I researched a little bit on the net and found three other robotic teddy bear models scheduled to hit the market in 2016.
This is the power of an idea not pursued. I have lots and lots of ideas I don’t pursue. But, I didn’t expect to see rudimentary implementations appearing so soon. But, here they are. It’s a little self-defeating when I look at it and go, “Oh well, they made one first, guess I can’t make one.” I can’t express or blame them for anything, after all, I was the one that decided not to work on the idea, not them.
Maus says the robot teddy bear that I imagine is much more complicated and sophisticated than the ones they’re selling now. I hope so, but once something hits the market it doesn’t take it very long to evolve into something very sophisticated.
I got my inspiration from the movie Artificial Intelligence. In the movie, they have a teddy bear that can move and talk called a ‘Super Toy’. Here are some clips:
I have a stuffed ‘teddy-dog’ named Podge. It’s technically a Russ Berrie Puppy Dog Podge.
I got him about ten years ago when I moved to Westminster and decided he needed a name. So I looked on his tag, and there was his name, Podge.
I actually wrote a short story about a boy and his ‘living’ toys titled “Impossible Things”. I’ll re-publish it here at some point, but right now you can find it on my old blog.
If I’m going to make a robot teddy bear I must start where I start any project of mine: with a name. I always come up with a name first, and in that way, I can lend a certain amount of focus to whatever I’m working on. It’s also very handy to have a name because then I can assign things to it and talk about it, as well as organize them in my notebooks and computer.
I’ve decided my robot teddy bear will be named Rodge. This is a nod to Podge, as robot starts with the letter R, and also a nod to my friend Bucky’s character Roghroo. It is NOT a nod to my Starbucks friend Roger, a dirty old man.
So, now we have a robot named Rodge. Or, we will have a robot named Rodge.
I…. Need… Braaaains!
The first thing that has to happen is it needs some way of functioning. Functioning is a large domain, encompassing everything Rodge can do. This includes speech recognition, moving, concept formation, etc. To me, there are three most important things to consider. A way to communicate to the entity, and a way for the entity to communicate back. Once these have been established, being that a dialogue of some form is possible, the entity can expand its capabilities.
In my previous post, I demonstrated a very basic local-machine speech recognition system, along with a very basic local-machine speech synthesis system. It could reply to my input with output. That was a dialogue, and though it was rudimentary, it was very cool.
As a side note, when I approached Ninja about building a robot teddy bear, one of the things that he and I discussed was the speech recognition system. At the time, even now, most speech recognition systems use the ‘cloud’ to process information. The microphone picks up the signal, it’s transmitted to the central server where huge neural networks and other algorithms process the data (more likely Markov models than neural networks). Thus, if I wanted speech recognition in my robot teddy bear, I’d have to use some kind of internet service that already existed (possibly not free) AND my bear would have to be connected to the internet to function.
I didn’t like this option and hoped I could somehow develop speech recognition capabilities without the aid of a server. I noticed that the speech recognition software available with OS X had an ‘offline’ mode that would use a large set of data downloaded to the machine. Of course, this is a black box to me, I have no idea what they’re doing. But it gave me hope that I would be able to potentially do the same.
Well, I now have. It’s extremely rudimentary as it has many false starts, false matches, and difficulty interpreting some words like ‘song’ (very much a probable result of its initial data set). It also can’t discern between two different voices. But, it works at some level and that’s what I want. Beginning are beginnings, right?
Okay, I digress, but you get the general idea. I said three things before and only listed two, so I’ll expand a little bit. I’m trying to create an autonomous entity. This entity can be considered a black box of functionality. I create functions and methods that take input and produce output all the time. However, this entity is a little different. First, it takes in input, but not any input. There have to be mechanisms to regulate and classify various information before it reaches the entity. Second, the entity doesn’t just ‘process’ information to produce an output, which is a model I’ve been hung up on, but instead it processes the input to form a data representation. This data representation is the heart of the beast. From there, it takes an imperative from the input and uses the data representation to generate an output. This output may or may not necessarily operate on the input given to the entity.
Actually then four sets of algorithms: input filtering, data representation storage, inference and reasoning over the data, and output generation as a way to communicate.
I’ve developed Primus\Falcraft somewhat haphazardly as more of a mental exercise and not really directed at a particular purpose. The idea in general was to develop data structures and algorithms in pure PHP that could facilitate the construction of artificial intelligence.
This is still the aim, but things may have changed a bit.
At first, naïvely, I decided I’d implement an entire artificial intelligence in PHP, porting other libraries to the language. That’s not a bad idea, but it’s not very practical. First off, poring over and porting other libraries is time-consuming, and after they’re ported any upgrades to the library aren’t reflected in your port.
With the success I’ve had with the open-source libraries I found for speech recognition and synthesis I’ve decided that that plan is outdated. With a little hacking, I was able to glue two systems together using Python and produce an extremely rudimentary ‘artificially intelligence’ in the manner of two days. This gives me quite a bit of hope.
So the general idea then is to split up the parts of the systemic goal into smaller more specialized systems that communicate with each other, the original plan of Primus. I plan to use the Falcraft system written in PHP as the central ‘hub’, providing the glue necessary for what I call ‘agency’. Falcraft is named after the last name of my fictional alter ego Kadar. He’s a half-wolf half-skunk eight-year-old wunk. He’s other ages too, but I generally imagine his canonical version as eight. The ‘agency’ library will provide the central loop and processing of all external data and act as a pseudo-bridge to a knowledge representation and automated motivation system to act.
Each component I imagine as a ‘server’ with client interfaces. This allows each system to communicate with the other systems in a way that the only coupling becomes the data format. This isn’t much of a problem either since it’s possible to construct filters that convert different formats to something universal. This also allows easy additions and swapping out of various libraries as things improve and upgrade.
So I need to come up with various sub-systems so that I can start isolating and focusing their purpose in Primus. This of course, in my book, means some initial naming. Let’s get to a rough draft:
First, we have to address some software abstractions.
- The central hub, as of now written as a PHP daemon. This server coordinates all the other servers in such a way, providing the appropriate glue.
- This is the central data server. It allows various interfaces and encapsulations of data. Examples of this could be AtomSpace or a REST file server. Why a REST file server, could be an inquiry. A URI may point to a moving file path, but present a unified front as an example.
Next would be the sensory input systems.
- I’m thinking this will be a ‘parent’ server that will produce more formal or formatted outputs to Falcraft from the sensory systems in a somewhat interrupt like fashion. This receives input from the sensory systems and converts that input to a particular framework of output.
- This is the audio ‘parent’ server. It collects and interprets audio data as necessary. This includes all auditory inputs, such as speech, music, environmental sounds, etc. It also takes care of augmenting the audio/video sub-systems as such acting as a trainer (I can’t remember the AI term for it), matching up unprocessable or incorrectly processed data back to the correct model in the subsystems.
- This is the speech recognition system. The basic idea is to convert what somebody says into a textual representation. This regulates speech input as well, such as detecting a phrase to initiate speech input, and configuring the audio hardware. Other things the system would do is discern between voices as well as collect intonation and auditory subtext information. At first this is the only auditory recognition system available.
- This would be the tactile sensor system. This would allow the robot to configure and interact with hardware that would deter pressure applied to the parts of the robot. This may seem esoteric, but it’s very important for detecting object and self placement in the environment; operating things such as collision detection.
- The graphical aspect. This interprets external visual data in terms of video and images. It integrates the visual hardware, such as a dual camera set up for three dimensional processing, as well as the software necessary for feature detection and model abstraction.
The next step would be knowledge representation. This is the most sketchy in the rough draft, but I think the general idea would be:
- This takes the linguistic information provided by Synthesizer and ‘decodes’ it into a form of grammar. Essentially, this is the initial natural language processing element. It decodes and tags all the parts of speech into a processable system that is more fit for semantic networks and ontological hierarchies.
- This processes all the sounds not covered by the Unscrambler, as well as the visual and tactile information.
- This is the central knowledge representation system. It utilizes various methods such as hypergraphs, frames, semantic networks, and markov models.
- This is the reasoning engine that operates upon Universe-provided information. It implements such things as first order logic, fuzzy logic, propositional analysis, proofing, and more.
The final step is the generation of activity (output). This takes a multitude of forms:
- Intrinsic and external motivation engine. This would allow the system for instance to actively learn about a subject, clarify meanings, decide to do something. This ties into the other activity systems.
- This is the motion planning layer. This is important in a robotic system. This is able to detect and represent the robot’s environmental placement and motion. This would handle such things as standing, walking, balancing, grabbing, pushing etc.
- The language generator. A specialized system would be used for natural language generation. It would use various models such as template phrases, topic selection, and grammar formation.
- The specific speech synthesis engine. This is straightforward producing discernible sounds that individuals can understand.
- This is the data generator for various other forms of expression, such as non linguistic sound. Complex programming such as genetic algorithms and classifiers. This would be data that could be reasoned and employed that would not be feasibly expressed in other formats.
And that’s it! Yeah right, that’s it. Oh my god, until I enumerated it I had no idea how complex and large this system would be. I could spend a lifetime programming only one of these systems! How am I going to be able to integrate and cover each of these systems?
That, I don’t know. But it’s a start. A very ambitious meager humbling start.