There has been discussion about using ActivityPub (AP) for federated home automation or Internet of Things (IoT) applications. One of the scenarios I've seen discussed uses an AP message to set a thermostat temperature.
"Hey @thermostat@My_Home.example.com, please set the temperature to 19°C. Thanks!" -- Terence Eden, Fediverse of Things
I assume a request like this would be sent as a post from a client such as Mastodon. This might translate to a Create
activity with a Note
object having the request in the content
property.
How is the thermostat supposed to interpret this request? Given it is natural language, it seems like a potential task for an LLM. This article describes a few simple experiments using ChatGPT to interpret the request and generate device commands.
TL;DR - The initial results were surprisingly good.
To interpret the request, I'm using an LLM prompt with the following structure. I don't claim to be a prompt engineer and this was a quick experiment, so I'm sure the prompt could be improved.
Given the following JSON commands:
…
and the device names:
…
and the command "…"
What is the inferred JSON command? Put the query into the prompt property. Put a detailed description of your reasoning into the explanation property.
(where the "..." are placeholders.)
JSON command descriptions help the LLM map the requests to something that can be sent to a device controller. For example, a command might be:
To turn on a device:
{ "device": "DEVICE_NAME", "action": "ON", prompt: "PROMPT", "explanation": "EXPLANATION"}
The device name data associates a device description with the device's name. For this experiment, I only defined a few devices: a thermostat, living room lamp, bedroom lamp, and a bedroom fan. Notice I don't list the thermostat explicitly since the description and the name are identical.
For these experiments, I'm interactively entering these prompts into the ChatGPT web interface. For a real application, the commands and devices would be automatically inserted into the prompt placeholders and the OpenAI API would be used to generate the commands.
The prompt, including command and device information, looks like:
Given the following JSON commands:
To turn on a device:
{ "device": "DEVICE_NAME", "action": "ON", prompt: "PROMPT", "explanation": "EXPLANATION"}
To turn off a device:
{"device": "DEVICE_NAME", "action": "OFF", prompt: "PROMPT", explanation: "EXPLANATION"}
To set a thermostat to a temperature:
{"device": "DEVICE_NAME", "action": "SET", value: "TEMPERATURE", prompt: "PROMPT", explanation: "EXPLANATION" }
(if temperature is in Fahrenheit units, convert it to Celsius. It is also possible to set a relative temperature using + or - as a prefix. Only include the numeric value.)
If the command is a conjunction, consider each request separately and return a list of device commands.
and the device names:
living room lighting is "livingroom_lamp",
bedroom fan is "bedroom_ventilator".
and the command "..."
What is the inferred JSON command? Put the query into the prompt property. Put a detailed description of your reasoning into the explanation property.
The only placeholder now is for the actual request.
Not that the thermostat control command has some extra intelligence. I've instructed the LLM to convert Fahrenheit to Celsius for the device command. I've also instructed it that it's possible to use relative settings (e.g., adjust the temperature by a specified number of degrees).
Finally, I've instructed the LLM to accept multipart requests (conjunction) and generate a list of commands for that case.
For the experiments, I'll only specify the request for the placeholder. This request would be the content
field of the AP Note
object.
One difference between this approach and the one described in the Fediverse of Things article is that the actor in this article is not a device proxy. It's an intelligent home automation agent that can control multiple devices.
Experiment: Set the temperature
Request: "please set the temperature to 19°C. Thanks!"
This is the example request from the Fediverse of Things article.
Response:
{ "device": "thermostat", "action": "SET", "value": "19", "prompt": "please set the temperature to 19°C. Thanks!", "explanation": "The command 'please set the temperature to 19°C. Thanks!' indicates a request to adjust the temperature. Therefore, it is inferred that the thermostat should be set to 19°C as requested." }
ChatGPT correctly selected the SET command for the thermostat although the device was not explicitly mentioned.
Experiment: Implicit Request
Request: "It's cold"
Response:
{ "device": "thermostat", "action": "SET", "value": "+2", "prompt": "Adjust the thermostat to increase the temperature by 2 degrees Celsius.", "explanation": "The command 'it's cold' suggests a need for heating. Therefore, we're setting the thermostat to increase the temperature by 2 degrees Celsius to provide warmth. The value is specified as a relative temperature increase using the '+' prefix." }
I'm impressed. The LLM has determined that changing the thermostat by a relative amount will address the issue.
Experiment: Multiple commands
Request: "please set the temperature to 68°F and fix the darkness in the living room. Thanks!"
Response:
[ { "device": "livingroom_lamp", "action": "ON", "prompt": "Fix the darkness in the living room. Thanks!", "explanation": "The query mentions fixing the darkness in the living room, which implies turning on the lighting in the living room." }, { "device": "thermostat", "action": "SET", "value": 20, "prompt": "Please set the temperature to 20°C", "explanation": "The query requests setting the temperature to 68°F, which needs to be converted to Celsius. 68°F is approximately 20°C." } ]
The LLM converted the Fahrenheit to Celsius for the thermostat. I didn't mention the living room light specifically, but the LLM still generated a lighting command to fix the reported darkness there.
Closing Thoughts
Given this was a quick experiment, I thought this was very interesting. The results seem much better than what I've encountered from voice-based virtual assistant products.
Similar prompts could be used to query devices (get the temperature or humidity, for example).
I can also imagine prompts to implement automation rules by querying and including current properties (like current temperature or home occupancy state) and then conditionally generating an action command response.
This approach could benefit from existing tools and specifications. A tool kit like LangChain could be used to implement the prompt construction pipeline. Although I invented a trivial command language for this experiment, using something like the Web of Things (WoT) architecture (or the equivalent) to implement the commands is a better choice. Creating a WoT protocol binding for ActivityPub could also be an interesting project.
Since the commands and device names must be included in the prompt, this approach will not scale well when there are large numbers of command or devices. However, it will probably work well in a typical home or garden automation scenario. That said, there may be ways to improve the scalability through a hierarchical chain of prompting that determines device categories for the request and then uses a category-specific prompt for the specific device commands.
That's an experiment for another day.