Object Following Robot



In this demo, we continue to showcase QikEasy’s Virtual Wireless Sensor feature by using it together with your desktop computer to perform object recognition and follow the object (we use a bottle as the object in our demo) as it moves around in physical space.  Such task normally cannot be done by LEGO robot.  The project uses the mobile phone’s camera (which is mounted on the robot) as video input, and uses the desktop computer to perform object detection, then send the object’s detected location to your robot via QikEasy’s Virtual Wireless Sensor functionality.  The robot makes calculated movements so that it will always face towards the object, and follows it around.





Architectural Overview


The project uses OpenCV and YOLOv8 for performing the object recognition on the computer.  Out of the box, YOLOv8 is able to recognize a number of objects.  In our demo, we chose to use a bottle as the object that the robot will follow.


Here’s the flow chart to show the relevant components and how data flows between them:


Required Materials


To follow this tutorial, you must prepare these before starting the project:


    • Your LEGO Spike Prime or Robot Inventor system.


    • An iPhone or an Android phone


    • Your QikEasy Adapter must be setup to run in Virtual Wireless Sensor mode.  See instructions on how to set that up here.


    • A desktop or laptop computer that is setup with Python and OpenCV library for Python.  You may install OpenCV through the command :
      pip install opencv-python
    • You will need to install the YOLOv8 library for Python.  You may install OpenCV through these commands :
      git clone https://github.com/ultralytics/ultralytics.git
      cd ultralytics
      pip install -r requirements.txt
      pip install ultralytics
      cd ..


    • You may also need to install the “request” module for sending HTTP request if you don’t already have it:
    • pip install requests


Your smartphone, the computer and the QikEasy Adapter should all be connected to the same Wifi network.

Building your LEGO Robot


Please refer to the Face Tracking Robot video (https://youtu.be/mTkCwWGSBXk) for detailed instructions on how to build the robot.  Note that the instruction is for an iPhone XS Max.  Depending on your phone’s size, you will have to try different LEGO brick configurations to see how to make your phone fit.


DISCLAIMER:  You should also make sure you secure your phone properly. We will not be responsible for any damage to your smartphone.


The Video Stream


The robot relies on the mounted smartphone to stream video of what the robot sees to the computer.  Your phone will act as an IP Webcam.  Depending on the OS of your phone, there are different FREE Apps we found that will make your phone an IP Webcam.


For iPhones:



The App that works for us is call “IP Camera Lite”.


To setup the MJPEG stream on iPhone, download and open “IP Camera Lite” from your App Store.

    • Once opened, click “Turn on IP Camera Server” at the button
    • Click the “Connect?” link at the bottom, and you will see information about the URL for different protocols. 
    • Enter the url for MJPEG in a new browser tab to test to make sure it works.  When prompted for user name / password, enter admin and admin.
    • If it works, note down that the video stream URL which will be needed in the Python script later.  The video URL will be in the form of  “http://{VIDEO_URL_USERNAME}:{VIDEO_URL_PASSWORD}@{iPhone’s IP Address}/video”, where the default username and password are both “admin”.


For Android Phones:




You may use the app called “IP Webcam”, as demonstrated by this video.

The OpenCV Python Code




Just make sure you followed the steps in the requirements section and installed openCV and YOLOv8 libraries.


When you run the python program the first time, it will automatically download the default models for objects that YOLOv8 can recognize.  The process may take a minute or two.


Key Points To Note:



    • We cannot perform HTTP GET requests synchronously within the loop that captures and processes each frame, as it will be too slow due to the time needed to wait for each HTTP request to receive a response.  Instead, we have to run the HTTP request sending loop in another thread. Doing it this way will be less laggy for the purpose of processing the video frames.


    • Because the IP Camera’s MJPEG Video Stream buffers multiple frames, if the video viewer doesn’t fetch the frame fast enough, there will be an obvious lag to the viewed video.  Having to process each frame, the Python is definitely not able to catch up with the streamed video buffer, so using the original development version of the code, the output video exhibits an obvious delay of 5 seconds or more. Later, we figured out that the simplest way to solve this problem is by skipping multiple frames for each frame that we perform analysis on face detection processing.  In our current code, we are set up to use only 1 out every 5 frames (i.e. the VIDEO_FRAMES_TOP_SKIP variable).  You may try changing this variable to see its effect on lag and performance.


Customizing Settings for your Environment:


You should customizer these settings before you run the program:

    • VIRTUAL_ SENSOR_IP_ADDRESS –  This should be set to your QikEasy Virtual Wireless Sensor’s IP address.
    • VIDEO_URL_USERNAME – This should have the value “admin” if you have an iPhone and you haven’t changed it in the “IP Camera Lite” app.
    • VIDEO_URL_PASSWORD – This should have the value “admin” if you have an iPhone and you haven’t changed it in the “IP Camera Lite” app.
    • VIDEO_STREAM_URL – This is the URL you noted from the previous step.  If you have an iPhone, you would most likely only need to replace the IP address part of the URL.


Here’s the code:

import time
import cv2
import threading
import requests
from ultralytics import YOLO

# Constant
VIDEO_FRAMES_TO_SKIP = 10	 # the no. of frames we skip for each frame we will process


# Global variables for communication between threads
lock = threading.Lock()
obj_location = None

# Function to send an HTTP request with the location of the detected object
def send_request():
	global obj_location

	while True:
		with lock:
			x_percent = 0
			y_percent = 0
			w_percent = 0
			h_percent = 0
			if obj_location is not None:
				x_percent, y_percent, w_percent, h_percent = obj_location
			url = f"http://{VIRTUAL_SENSOR_IP_ADDRESS}/set?t=c&r={round(x_percent * 1024 / 100)}&g={round(w_percent * 1024 / 100)}&b={round(h_percent * 1024 / 100)}"
			#print(f"Sending request to {url}")
			obj_location = None

# Load the pre-trained YOLOv8 model
model = YOLO("yolov8n.pt")

# Open a video capture object
cap = cv2.VideoCapture(0)
#cap = cv2.VideoCapture(VIDEO_STREAM_URL)

# Start the secondary thread to send HTTP requests
thread = threading.Thread(target=send_request)
thread.daemon = True

frameCount = 0

while True:
	# Read a frame from the video capture
	ret, frame = cap.read()

	if not ret:

	frameCount = frameCount + 1

	if (frameCount % VIDEO_FRAMES_TO_SKIP) != 0:
		#print( f"{frameCount} Skip" )

	(h,w) = frame.shape[:2]

	frameCount = 0

	# Flip the image so that it will display as a mirror image
	flipped_frame = cv2.flip(frame, 1)	 

	# Perform object detection
	results = model.predict(flipped_frame)

	# Iterate over the detected objects and draw the text on the bounding box
	for box in results[0].boxes:
		# Get the class label and confidence score
		classification_index = box.cls
		class_label = model.names[ int(classification_index) ]
		confidence = box.conf[0]

		# Draw the bounding box on the frame
		obj = box.xyxy[0]
		#if class_label=="bottle" and confidence > 0.5:
		if confidence > 0.5:
			cv2.rectangle(flipped_frame, (int(obj[0]), int(obj[1])), (int(obj[2]), int(obj[3])), (0, 255, 0), 2)

			# calculate center_x, center_y
			obj_center = (int((obj[0] + obj[2]) / 2), int((obj[1] + obj[3]) / 2))
			obj_size = (int(obj[2] - obj[0]), int(obj[3] - obj[1])) 
			x_loc_percent = int((obj_center[0] / w) * 100)
			y_loc_percent = int((obj_center[1] / h) * 100)
			w_percent = int((obj_size[0] / h) * 100 )	   # note that this is normalized as a % of the frame's height
			h_percent = int((obj_size[1] / h) * 100 )  

			#print(x_loc_percent, y_loc_percent, w_percent, h_percent)

			with lock:
				obj_location = (x_loc_percent, y_loc_percent, w_percent, h_percent)

			# Draw the text on the bounding box
			text = f"{class_label}: {confidence:.2f}"
			cv2.putText(flipped_frame, text, (int(obj[0]), int(obj[1]) - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
			cv2.putText(flipped_frame, f"({w_percent}%, {h_percent}%) @ ({x_loc_percent}%, {y_loc_percent}%)", (int(obj[0]), int(obj[1]) - 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

	# Show the output frame
	cv2.imshow("Frame", flipped_frame)
	if cv2.waitKey(1) & 0xFF == ord('q'):

# Release the video capture object and close all windows

Program running on LEGO Hub


On the LEGO robot, the Word Block program will be running to receive X position, plus width and height (sent as R, G and B values of the Virtual Color Sensor) of the detected object relative to the video frame.  Based on the position and size, it will then determine how it should turn the robot.  In addition, the robot will move forward or backward based on the viewed size of the bottle with the goal of making the viewed bottle size the same as desired.

The Word Block Program


First of all, you will need to customize the code to suit your environment:

    • If you are running your program on Spike App 2 or Robot Inventor, change the block immediately after “When Program Starts” to set the max_red value to 255 (instead of 1024).  This is because inside Word Block, the range for red, green, blue are 0 to 1024 for Spike App 3 instead of 0 to 256 for Spike App 2.
    • There are a number of parameters you will need to tune to make your robot move swiftly, accurately without overshooting.  See the “Tuning Your Robot” section below for details.


There are the values Sent to our Word Block program through the Virtual Color Sensor:

    • “red” -> X position for the horizontal center point of the detected object represented as a value in the range of 0 to 1024, proportional to the size of the camera seeable image frame assuming its size is 1024 x 1024.
    • “green” ->  width of the detected object represented as a number in the range of 0 to 1024.
    • “blue” ->  height of the detected object represented as a number in the range of 0 to 1024.


The logic for our Word Block program is very simple:


    • Based on Virtual Color Sensor’s raw red color value, we first calculate the X position of the detected bottle as a “percentage” relative to the maximum range of the raw value.
    • Make sure the X position percentage value is not zero.   Zero usually means no bottle is detected.
    • If the bottle’s X % position falls with 45% and 55%, we say it is centered and no movement is required.
    • If the x position is not centered, the code will call the “turnMyRobot” function with the delta difference of the X value from the center position (i.e. 50).
      • This value could either be positive or negative indicated whether it is too much to the left or too much to the right.
      • The turnMyRobot function will use this delta value to set the speed, and the rotation amount (in degrees).  Note that the rotation amount uses the positive or negative value to determine the direction and the amount of movement it would make.
      • Since there is a lag between the actual camera video and when the computer receives and processes the stream, after the robot make the rotation movement, we will give the video lag a bit of time to catch up.  This is why there is a 1 second wait time added after the motor movement block.
    • Once the x position is centered, the code will adjust its distance from the bottle by moving either forward or backward:
      • It first reads the object size using the height value being passed in as “Blue” color.  It then converts the value into a % value and call it current_obj_size.
      • When the current_obj_size is not within the desired object size + or -1%, it will move the car forward or backward accordingly with the goal of making it the desired object size.


Robot Movements:


There are a number of parameters you can tune:


  1. Rotation Movement speed.  Current value is set to ABS( X_Position_Delta  * 0.5 ) .  You can adjust the multiplying factor to a bigger or smaller number to make the robot turn faster or slower.
  2. Rotating Amount.  Current value is set to X_Position_Delta  * 1.9.  Making the number too big may cause your robot to spin too far and over-swing.   Making the number too may make your robot turn too slowly because it would require multiple movements to reach the center point.
  3. Wait time between rotation movements.  Current value is set to 1 second.  If this time is set too big, the response time will become too slow.  If this time is set too small, there is a good chance that your program would think that the turn wasn’t enough (even if it is) and command another movement that will cause the rotation to overshoot.
  4. Forward/Backward Movement speed.  Current value is set to ABS( size delta  * -1 % ) .  You can adjust the multiplying factor to a bigger or smaller number to make the robot run faster or slower.
  5. Forward/Backward Movement Amount.  Current value is set to ABS(size delta  * 0.5cm).  Making the number bigger to cause your robot to make larger movements, and making it smaller to cause it to take smaller steps. 


For the speed and amount  numbers, making them too big will cause the movement overshoot, and then the next iteration will have to make movement back in opposite direction, thus causing the robot to oscillate.  Making these number too small will cause it to become too slow and the movement will be jerky.




Challenge Extensions


These are some of our suggestions for extending this project:


  • (Intermediate) Let’s add another degree of freedom to our robot’s camera.  You may build a motorized platform for holding the smartphone.  The motor can angle the phone upward and downward such that it always tries to center the object in the phone’s image.
  • (Advanced) It is possible to train YOLOv8 to recognize a custom object. You may refer to this Youtube (https://www.youtube.com/watch?v=TRMZCsBfX78) on how to do this.



QikEasy Adapter’s Virtual Wireless Sensor provides boundless integration opportunities with all sorts of data sources available over the network.  This project presents only one of its possible use.  You may visit our Virtual Wireless Sensor documentation page for more fun and interesting ideas.