Holiday Blog Update
- u6310128
- Jul 23, 2024
- 3 min read
It has been a busy holiday season for everyone here at Project Ver. With the proof of concept being tested on one of our ongoing stakeholders this Friday, both the software and hardware teams have been working diligently to get a prototype ready for testing.
Software Team Updates:
The software team has been focused on testing a variety of Large Language Models (LLMs) of different sizes and capabilities. Our goal is to enable the device to communicate using natural language while also discerning the amount and type of information useful to the end user. One avenue we are exploring is implementing a multimodal LLM, which can interpret a range of inputs (e.g., images, text, and audio) to generate comprehensive responses. This approach enhances the model's understanding of context and human interaction by considering both visual and textual inputs simultaneously.
To further enhance the Ver device's capabilities, we have been developing a text-to-speech model. This will enable users to interact with the product using voice commands, allowing them to ask specific questions tailored to their needs, such as determining if a product is gluten-free or locating a particular route on a bus schedule. While this model is under development, we are using a prompt-based system that lets users interact with pre-determined questions such as “Describe what I am looking at” or “Read the signs in front of me.” Over the coming weeks, we will refine these prompts to improve the device's output and response accuracy. If you have any suggestions for this prompt-based system, please email us at projectver.anu@gmail.com.
Our team has also made significant progress in reducing the latency response time. During this semester break, we successfully implemented audio response times of under one second. Using FastAPI libraries, we leverage in-built server socket connections to reduce latency between local hardware and our software agent, resulting in faster outputs for our end users.

Image description: The image showcases the system architecture of the Ver device. The flow diagram depicts each of the software components and functions running within the device. This includes things such as the video input, video stream, audio input and output and the user prompts.
The first component of the diagram describes the video input and video stream. The arrow flow is disrupted by two text boxes labeled "VDO ninja" and "OBS studio". These features describe the video live streaming and broadcasting software used to capture and process the images taken by the end user. This software allows the ver team to take a live footage streamed from a phone and broadcast it to a computer where image processing can occur.
This video stream is then flows by a arrow into two interrelated boxes labeled the "client" and the "server". The client refers to the part of the system that directly interacts with the user. For example, capturing the input (audio input and image streaming) as well as presenting the output after processing. The server is the powerhouse of all the operations that happen behind the scenes. The server runs the algorithms to complete TTS (text-to-speech) and operate the LLM. Once it has done all the necessary processing, the server sends the output back to the client.
Hardware Team Updates:
The hardware team has been working closely with the software team to ensure that the software requirements can be translated into physical hardware components. This includes defining RAM capabilities, power requirements, size limitations, Bluetooth and wireless capabilities, and overall ergonomic design.
We also discussed the idea of running the device entirely from the user’s smartphone through an application. This approach offers a simple user interface, powerful GPU capabilities, and a high-quality video stream. Available applications such as 'DroidCam', 'Ado Ninja', and 'OBS' allow iPhone footage to be livestreamed directly to a computer for image processing. This will be our approach for the proof of concept. However, while a smartphone meets the hardware requirements, we aim to develop a “hands-free” device activated by the push of a button for Version 1. To achieve this, we have ordered a Raspberry Pi 5 and Raspberry Pi Zero 2W and plan to begin testing both boards in the coming weeks. This testing will help us determine whether the device can run efficiently on the smaller Pi Zero or if it requires the additional power of the Pi 5.
With all this information in hand, the hardware team has started developing initial design prototypes in SolidWorks. In the coming weeks, we aim to have three preliminary designs ready for testing and iteration. Stay tuned for more updates!
Comments