Adept have embarked on building models that can take actions in the digital world through their first large model which they’ve called Action Transformer (ACT-1).
The pace at which AI has evolved over the past few years is truly amazing. Incredible and exciting new capabilities as the result of Scaling up Transformers. In language that would be GPT-3, 3.5 and 4, PaLM, Chinchilla, in code that would be AlphaCode and Codex and in image generation that would be DALL-E and Imagen.
What’s so intriguing about this new model?
Adept believes that what constitutes general intelligence would be a system which can perform tasks as good if not better than a human operating the computer. They have highly ambitious goals of being able to develop a system that can be trained to use every software tool, API and web app. The ACT-1 model is their first move to achieve this.
The way we interact with computers by using natural language interfaces to tell computers what to do rather than traditional GUI’s and actions is something Adept believes will change.
Being that ACT-1 is a large-scale transformer Adept have trained the system to use various digital tools. One such example is that they have trained the system how to use a common web browser. Currently they have the system linked to a Google Chrome extension. This allows ACT-1 to see what’s going on in the web browser and thereafter perform certain things such as scrolling, selecting, clicking, typing, etc.
Although the examples provided by Adept do not show a system operating at computer speeds but rather human speeds there’s a large amount of improvement to come in future systems.
Some of nifty things that ACT-1 can do are the following:
- A high level request can be manually entered by a user and thereafter ACT-1 will execute that instruction. In this instance the user needs to simply enter their instruction via text box and ACT-1 will do the rest.
- For tasks that are time consuming manual tasks that require complex multiple-steps in order to complete ACT-1 can save a lot of time. One such example that Adept gives is that a task that might typically “take 10+ clicks in Salesforce can now be done with just a sentence”.
- ACT-1 can interact with spreadsheets as well demonstrating real-world knowledge, extrapolate the contextual meaning of our commands and then action these. If you can think it and put it into words then ACT-1 can do the work.
- Another exciting feature of ACT-1 is its ability to complete tasks that require multiple tools. One such example would be to ask ACT-1 to perform a task to find the contact information for a website online and then to compose and email. ACT-1 would then search the internet, find the contacts email address, bring into your email client and compose the email based on your instructions.
What’s Coming - Beyond ACT-1
With the advent of NLIs (natural language interfaces) you can probably now see why action transformers just like ACT-1, are most certainly going to have an impact on how we interact with computers, phones and other devices that are connected. Adept believes that in a few years the following are likely to happen:
- That the way we interface with computers will be through natural language and not through the typical GUIs we currently use. Computers in a few years will be told what to do and perform those tasks necessary to do them for us. They believe and I agree with them that the user interfaces of the future will seem archaic.
- That beginners will be able to operate as power users with no special training necessary. Any person who will be able to put their ideas into natural language will be able to implement them.
- That documentation, manuals and FAQs will not be intended for people but rather models.
- That we’ll make bigger and quicker strides and achieve things at a quicker pace when we have AI as our team mate. This new ability to interact with systems will pave the way to quicker advances in drug design, engineering, material sciences and has the ability to really disrupt virtually every part of business and society.
Adepts ACT-1 model stands to disrupt and change the way that we ultimately interact with computer systems. The ease of performing tasks through the use of natural language will bring many efficiencies to both individuals and businesses. The time saving potential of this system is incredible.
I for one am super excited about this system and believe that systems and platforms like ACT-1 are just the beginning of what will be possible with AI within the next 3 - 5 years.
Adept have put together some demonstrations on how ACT-1 works in the web browser in their blog post here. I’ve signed up for their waitlist as I can’t wait to try ACT-1. You can do so as well on the homepage of their website by clicking the “Join the waitlist” button on their homepage.