Face Tracking Robot

Introduction

 

In this demo, we will showcase how QikEasy’s Virtual Wireless Sensor feature can be use together with your desktop computer to perform computing intensive task that cannot normally be done by LEGO robot.  The project uses the mobile phone’s camera (which is mounted on the robot) as video input, and uses the desktop computer to perform face detection, then send the face’s detected location to your robot via QikEasy’s Virtual Wireless Sensor functionality.  The robot makes calculated rotations so that it will always face towards the human face that it has detected.

 

 

 

Architectural Overview

 

Here’s the flow chart to show the relevant components and how data flows between them:

 

Required Materials

 

To follow this tutorial, you must prepare these before starting the project:

 

    • Your LEGO Spike Prime or Robot Inventor system.

 

    • An iPhone or an Android phone

 

    • Your QikEasy Adapter must be setup to run in Virtual Wireless Sensor mode.  See instructions on how to set that up here.

 

    • A desktop or laptop computer that is setup with Python and OpenCV library for Python.  You may install OpenCV through the command :
      pip install opencv-python

 

    • You may also need to install the “request” module for sending HTTP request if you don’t already have it:
    • pip install requests

 

Your smartphone, the computer and the QikEasy Adapter should all be connected to the same Wifi network.

Building your LEGO Robot

 

Please refer to the YouTube video above for detailed instructions on how to build the robot.  Note that the instruction is for an iPhone XS Max.  Depending on your phone’s size, you will have to try different LEGO brick configurations to see how to make your phone fit.

 

DISCLAIMER:  You should also make sure you secure your phone properly. We will not be responsible for any damage to your smartphone.

 

 

The Video Stream

 

The robot relies on the mounted smartphone to stream video of what the robot sees to the computer.  Your phone will act as an IP Webcam.  Depending on the OS of your phone, there are different FREE Apps we found that will make your phone an IP Webcam.

 

For iPhones:

 

 

The App that works for us is call “IP Camera Lite”.

 

To setup the MJPEG stream on iPhone, download and open “IP Camera Lite” from your App Store.

    • Once opened, click “Turn on IP Camera Server” at the button
    • Click the “Connect?” link at the bottom, and you will see information about the URL for different protocols. 
    • Enter the url for MJPEG in a new browser tab to test to make sure it works.  When prompted for user name / password, enter admin and admin.
    • If it works, note down that the video stream URL which will be needed in the Python script later.  The video URL will be in the form of  “http://{VIDEO_URL_USERNAME}:{VIDEO_URL_PASSWORD}@{iPhone’s IP Address}/video”, where the default username and password are both “admin”.

 

For Android Phones:

 

 

 

You may use the app called “IP Webcam”, as demonstrated by this video.

The OpenCV Python Code

 

Preparation:

 

Before you start, download the following two files from this git:  https://github.com/spmallick/learnopencv/tree/master/FaceDetectionComparison/models

 

 

Place these files at where your python code is.

 

 

Key Points To Note:

 

    • We use OpenCV’s Deep Learning Face Detector (instead of the default Haar Cascades face detector) for performing face detection.  The deep learning face detector provides significantly more accurate face detection capability, as noted in this article.

 

    • We cannot perform HTTP GET requests synchronously within the loop that captures and processes each frame, as it will be too slow due to the time needed to wait for each HTTP request to receive a response.  Instead, we have to run the HTTP request sending loop in another thread. Doing it this way will be less laggy for the purpose of processing the video frames.

 

    • Because the IP Camera’s MJPEG Video Stream buffers multiple frames, if the video viewer doesn’t fetch the frame fast enough, there will be an obvious lag to the viewed video.  Having to process each frame, the Python is definitely not able to catch up with the streamed video buffer, so using the original development version of the code, the output video exhibits an obvious delay of 5 seconds or more. Later, we figured out that the simplest way to solve this problem is by skipping multiple frames for each frame that we perform analysis on face detection processing.  In our current code, we are set up to use only 1 out every 5 frames (i.e. the VIDEO_FRAMES_TOP_SKIP variable).  You may try changing this variable to see its effect on lag and performance.

 

Customizing Settings for your Environment:

 

You should customizer these settings before you run the program:

    • VIRTUAL_ SENSOR_IP_ADDRESS –  This should be set to your QikEasy Virtual Wireless Sensor’s IP address.
    • VIDEO_URL_USERNAME – This should have the value “admin” if you have an iPhone and you haven’t changed it in the “IP Camera Lite” app.
    • VIDEO_URL_PASSWORD – This should have the value “admin” if you have an iPhone and you haven’t changed it in the “IP Camera Lite” app.
    • VIDEO_STREAM_URL – This is the URL you noted from the previous step.  If you have an iPhone, you would most likely only need to replace the IP address part of the URL.

 

Here’s the code:

import time
import cv2
import threading
import requests
import numpy as np

# Constant
VIDEO_FRAMES_TO_SKIP = 5    # the no. of frames we skip for each frame we will process
VIRTUAL_SENSOR_IP_ADDRESS = "192.168.0.135"

VIDEO_URL_USERNAME = "admin"
VIDEO_URL_PASSWORD = "admin"
VIDEO_STREAM_URL = f"http://{VIDEO_URL_USERNAME}:{VIDEO_URL_PASSWORD}@10.76.13.21:8081/video"

# Global variables for communication between threads
lock = threading.Lock()
face_location = None

# Function to detect faces in a frame and update the face_location variable
def detect_faces(frame):
    global face_location

    # Load the pre-trained deep learning face detector model
    face_detector = cv2.dnn.readNetFromCaffe(
        "deploy.prototxt",
        "res10_300x300_ssd_iter_140000_fp16.caffemodel"
    )

    # Resize the frame for processing
    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0))

    # Pass the blob through the network and obtain the face detections
    face_detector.setInput(blob)
    detections = face_detector.forward()

    with lock:
        face_location = None

    # Loop over the face detections and update the face_location variable with the location of the first detected face
    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > 0.5:
            box = detections[0, 0, i, 3:7] * np.array([frame.shape[1], frame.shape[0], frame.shape[1], frame.shape[0]])
            (startX, startY, endX, endY) = box.astype("int")

            # Draw the bounding box and label on the image
            cv2.rectangle(frame, (startX, startY), (endX, endY), (0, 255, 0), 2)

            # Calculate x_percent and y_percent relative to frame size
            face_center = (int((startX + endX) / 2), int((startY + endY) / 2))
            x_percent = int((face_center[0] / w) * 100)
            y_percent = int((face_center[1] / h) * 100)

            # Write the (x, y) percentage location on the image
            cv2.putText(frame, f"{confidence*100:.2f}%  ({x_percent}%, {y_percent}%)", (startX, startY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)

            with lock:
                face_location = (x_percent, y_percent)

# Function to send an HTTP request with the location of the detected face
def send_request():
    global face_location

    while True:
        with lock:
            x_percent = 0
            if face_location is not None:
                x_percent, y_percent = face_location
                
            url = f"http://{VIRTUAL_SENSOR_IP_ADDRESS}/set?t=c&r="
            myUrl = f"{url}{round(x_percent * 1024 / 100)}"
            #print(f"Sending request to {myUrl}")
                
            requests.get(myUrl)
            face_location = None
        time.sleep(0.5)

# Open the default camera
camera = cv2.VideoCapture(VIDEO_STREAM_URL)

# Start the secondary thread to send HTTP requests
thread = threading.Thread(target=send_request)
thread.daemon = True
thread.start()

frameCount = 0
# Loop over the frames from the camera
while True:
    # Read a frame from the camera
    ret, frame = camera.read()

    frameCount = frameCount + 1

    if (frameCount % VIDEO_FRAMES_TO_SKIP) != 0:
        #print( f"{frameCount} Skip" )
        continue

    frameCount = 0
    
    # Flip the image so that it will display as a mirror image
    flipped_frame = cv2.flip(frame, 1)    

    # All the heavy duty processing is done here
    detect_faces( flipped_frame )

    # Display the output frame
    cv2.imshow("Output", flipped_frame)

    # Check for the "q" key to quit
    if cv2.waitKey(1) == ord("q"):
        break

# Clean up
camera.release()
cv2.destroyAllWindows()

Program running on LEGO Hub

 

On the LEGO robot, the Word Block program will be running to receive XY positions of the detected face from the computer.  Based on the position, it will then determine how it should turn the robot.

 

The Word Block Program

 

First of all, you will need to customize the code to suit your environment:

 

    • If you are running your program on Spike App 2 or Robot Inventor, change the block immediately after “When Program Starts” to set the max_red value to 255 (instead of 1024).
    • There are a number of parameters you will need to tune to make your robot turn swiftly and accurately.  See the “Tuning Your Robot” section below for details.

 

The logic for our Word Block program is very simple:

    • Based on Virtual Color Sensor’s raw red color value, we first calculate the X position of the face as a “percentage” relative to the maximum range of the raw value.
    • Make sure the X position percentage value is not zero.   Zero usually means no face is detected.
    • If the face X % position falls with 45% and 55%, we say it is centered and no movement is required.
    • Otherwise, the code goes into the IF condition.  We will immediately call the “turnMyRobot” function with the delta difference of the X value from the center position (i.e. 50).
      • This value could either be positive or negative indicated whether it is too much to the left or too much to the right.
      • The turnMyRobot function will use this delta value to set the speed, and the rotation amount (in degrees).  Note that the rotation amount uses the positive or negative value to determine the direction and the amount of movement it would make.
      • Since there is a lag between the actual camera video and when the computer receives and processes the stream, after the robot make the rotation movement, we will give the video lag a bit of time to catch up.  This is why there is a 0.8 second wait time added after the motor movement block.

 

Tuning the Robot:

 

There are 3 parameters you can tune:

 

  1. Movement speed.  Current value is set to ABS( X_Position_Delta  * 0.4 ) .  You can adjust the multiplying factor to a bigger or smaller number to make the robot turn faster or slower.
  2. Rotating Amount.  Current value is set to X_Position_Delta  * 1.9.  Making the number too big may cause your robot to spin too far and over-swing.   Making the number too may make your robot turn too slowly because it would require multiple movements to reach the center point.
  3. Wait time.  Current value is set to 0.8 second.  If this time is set too big, the response time will become too slow.  If this time is set too small, there is a good chance that your program would think that the turn wasn’t enough (even if it is) and command another movement that will cause the rotation to overshoot.

 

 

 

 

Challenge Extensions

 

These are some of our suggestions for extending this project:

 

  • Handling of when multiple faces are present
  • Make the robot follow the face to move forward and backward based on the size of the face detection rectangle

Conclusion

 

QikEasy Adapter’s Virtual Wireless Sensor provides boundless integration opportunities with all sorts of data sources available over the network.  This project presents only one of its possible use.  You may visit our Virtual Wireless Sensor documentation page for more fun and interesting ideas.