…and our commitment to helping you stay fit – anytime and anywhere.
Behind our motivation to help you stay fit at home, is a plethora of technologies from video streaming stack that can operate at the scale of a live cricket match to an advanced scoring algorithm that can process camera movement and determine your score based on your movements and a real-time scoring/ranking system that helps you invite and compete with your best friends! In this post, we will take a look at each and every aspect of how we made cult.Live classes possible and the lessons we learnt over the past few months.
The Energy Meter
At the heart of delivering an engaging experience through our live classes is the Energy Meter. There is a lot going on behind those colourful little bars that fill in as the intensity of your workout increases. Even better, it constantly reminds you to push harder and further to be more active.
When building the energy meter, we had to answer many questions of which the most important was – How do we determine the intensity level of your workout?
The first thing that rushed to our mind was a wrist band. But as we dug in deeper with the goal to make fitness more accessible, we came up with a few other solutions.
Think of the Energy Meter as a virtual movement tracker. It combines AI technologies like Computer Vision and Deep Learning to understand how the user moves during the workout, how accurately he/she is performing the exercises, and approximately how much energy he/she is burning while moving a particular body part during the workout.
Generally, such Deep Learning models are run on servers. Running our AI models server side would mean uploading the user’s video stream from the user’s mobile to our servers during the session and sending back the calculated values. Assuming 3,00,00 users join a session, anywhere from 300-1000 high-end GPU based servers would be needed to process the images in real-time. So running AI models on servers is not a feasible solution for us due to the following reasons:
- Our users’ privacy is of the utmost importance to us. Some of our users might not be comfortable with sending their personal video stream to our servers
- Energy Meter should give instantaneous feedback to the user. A server-side solution will have a significant latency between the user’s action and feedback
- The server-side solution increases the cost for the user as well as for us significantly. Users are charged for the upload bandwidth by the data providers, and cure.fit has to maintain a large number of dedicated servers for processing these images.
The Challenges of On-device ML Inference
Alongside the privacy and the latency benefits that we got by running machine learning on mobile devices, we also had to deal with a set of challenges.
Each device features different architecture and capabilities. Thus, we had to support multiple SDKs to ensure the best experience on each device. Below you will find a list of the SDKs that we use for each platform:
|Platform||ML inference SDK|
|iPhone||CoreML Vision Framework|
|Android + Qualcomm Snapdragon chips||Snapdragon Neural Processing Engine|
|Android + Mediatek/Kirin/Exynos||Tensorflow Lite|
How We Optimised the Model to Run in Real-Time
Phone processors and memory capabilities are far inferior to what is available in cloud servers. Additionally, unlike a server environment with effectively infinite resources, most devices will run using the battery while the users are working out.
To run an ML model in real-time, we had to make sure that it processes one image frame before another one comes in. However, in our case, we had to hit a baseline performance where 10 images could be processed and scored in each second. This enabled the system to recognise the most common movements in workouts and assign an energy value and score to it.
In addition, we found that our model was more likely to run better on the latest models of smartphones rather than the older versions. But, as our goal was to make fitness more accessible regardless of the smartphone, we had to find our way around this issue. Thus, we tuned the network to do fewer calculations, but still give a great output. For instance, if you take a closer look at any state-of-the-art pose detection deep learning model it will have close to 5M million parameters. But, in our case, we had to optimize our model to give better accuracy at 1/10th of parameters.
Currently, our model achieves 100ms inference time on low-cost processors like Qualcomm Snapdragon 427 chipset. On a high-end processor like Snapdragon 855, this goes down to 12ms!
Pushing Further, Increasing Accuracy
If you speak to any photographer, they will tell you that the key to a great shot is great lighting. To accurately detect various aspects in the camera frame, a well-lit environment is highly important. Similarly, people who attend our live classes should be in a well-lit room for the Energy Meter to deliver accurate results. However, the truth is that people often workout indoors with minimum lighting that mainly comes from tube lights operating at varied frequencies and with a certain tint to their light colour.
Existent solutions like Posenet, Openpose model predicts 14 key body points on any input image. And, these work great on big servers with GPUs and casual images of people doing various outdoor activities. But in our use-case, we have dimly lit rooms and people working out in various postures using low-cost mobile devices.
This meant, we had to come up with our own dataset of images to train models. To improve model accuracy, we worked on two parallel streams – Architectural modifications in state-of-the-art deep learning models & Annotations for complex workout poses in the shortest span of time.
After nearly 100 iterations of different model architecture, we came up with a novel architecture that beat all our benchmarks of accuracy and speed. Data labelling used to be a mammoth task for us until we created an AI-assisted Data labelling pipeline where we fed relevant workout videos. And, an in-house gamified tool was used by annotators to label only the edge cases in the most intuitive manner. Nevertheless, there were still a lot of improvements we aimed to do in terms of detecting various poses.
Keeping Score – the Art of Measuring Workout Intensity
The ML model inference step only determines the position of hands and legs in specific image inputs. We still had to determine these motions, predict how hard those movements were, and assign a proportional energy value.
To make sure that we stayed true to physics, we decided to simulate physics. For instance, if the head moves up and down over the course of a few seconds, we assigned it a higher energy level compared to when the head stays still during the exercise movement.
Scalable Live Streaming Architecture
When the COVID-19 pandemic hit India and the lockdown was enforced, the traffic on cure.fit live classes spiked 30x the usual volume within a week. Within a span of three weeks, the team had to re-architect the system to enable nearly 3,00,000 users to attend a class at the same time.
Presently, the server stack plays 3 key roles:
- Data Writes: Collect and store data like user scores and playback metrics.
- Data Reads: Read and return data to the app to be displayed.
- Data Processing: Process streamed data in real-time like ranking the users.
Data Write Architecture
To ensure our users get to enjoy a great experience, it was important that we kept an eye on the metrics of both streaming and Energy Meter performance. To achieve this, the app was made to frequently send data to our servers. This is one of the highest across the various APIs at cure.fit. In order for it to scale well, the architecture for writes to the server was made to look something like this:
When a class starts, the write traffic spikes fast. Though auto-scaling does kick in, it takes some time to add more machines. By offloading all the heavy processing to a queue, the server can manage the initial spike well.
Furthermore, the usage of Redis keeps latencies and server load low during a class. Lastly, the data is synced periodically to MongoDB for data access post-class. This architecture gives us a good scale at low cost in the write path.
Data Read Architecture
In the read path, the data access pattern is different during the class and post-class. During the class, the data size accessed is small (limited to the user’s class data), but the read throughput is quite high. On the other hand, post-class, the data size accessed is large (includes the user’s all past classes), but the throughput is low.
The architecture was designed keeping this in mind. cure.fit server first reads from Redis. In case, it does not find the necessary data, it reads from MongoDB. This ensures that during the class all data access is from Redis, which enables high throughput and scale. Post class, the data access shifts to MongoDB. This strikes the right balance between scale, latency and cost.
Typically, server side processing for gamification activities during a class, such as ranking users, scales exponentially with the number of users. Usual games avoid this issue by assigning users to small rooms/groups and performing game processing only within those groups. However, breaking users into small groups was not an option as it impacts the social aspect of our cult.live experience. For example, even though 3000 people are working out together in the same live class, a small group would mean that you will be ranked amongst say 30 or 100, which does not give one a feeling of working out with a larger group.
cure.fit’s server uses a custom score bucket based ranking algorithm that scales linearly with the number of users. Every user score update is O(1), current user rank fetch is O(1). The algorithm prioritises generating an estimated rank, with the actual rank being updated every 10 seconds.
This helped us strike a nice balance between performance, latency and user experience.
Building App for Performance
At the end of the day, it all boils down to the fact that the app should be able to run smoothly while running multiple things simultaneously. The app should be able to stream live sessions without buffering, control the camera, run model inference, compute Energy Meter scores, send and read data from the server, and finally display all of this beautifully with animations but without lag and stutter.
Quite a huge feature list for any app! In this blog, we have only covered the optimisations we performed on Live Streaming stack. We will be covering the details of our Live Streaming stack in a separate blog.
Optimisations for Streaming Live Video
In India, networks, especially cellular, can be unreliable. It can work brilliantly at 40Mbps now and lose connection the next second. To tackle such small disconnections and to present a smooth playback experience to our users, it was important that we enabled pre-download feature for video segments.
At cure.fit, we used HLS to live stream content. According to HLS (HTTP Live Streaming) protocol:
- Video players periodically download an “m3u8” URL file, every few seconds. In the example shown below, the player re-downloads the m3u8 file every 10 seconds.
- This file contains a list of downloadable video segment URLs for playback. In the example shown below, the m3u8 file contains 5 video segments.
- Each video segment URL contains downloadable video data that can be of the length 2 to 100 seconds or longer.
- As the live class proceeds, each update of the m3u8 file contains a moving window of video segments.
Green blocks represent video segments available for download
Here are the key important configurations that helped us offer the best results for live streaming in bad network environments:
- The m3u8 file contains a moving window of around 30-60 seconds of video segments. This allows the player to pre-download enough video to tide over network disconnections.
- The players are forced to start downloading from the beginning of the list of segments. Typically most players download from the end of the list to keep the end-to-end latency low.
- Lastly, the video segment size was made to be somewhere between 6-10 seconds. Anything smaller or larger will add a significant download overhead on bad networks.
Please keep in mind that these configurations are only suitable for cases where a few seconds latency in live streaming is acceptable.
Balancing Playback vs Energy Meter
With so many applications running at the same time on a mobile phone during a fitness class, certain resource shortages like CPU, GPU, maybe even memory are bound to happen.
On low-cost devices, Energy Meter impacts the video download by the player due to CPU contention. Thus, Energy Meter is given a slightly lower thread priority than the video download, which then has a lower priority than the UI thread.
This ensures that the UI is always responsive and the player is given more CPU when necessary.
The Energy Meter, like any other game feature, is accompanied by animations to make it interesting. To keep the memory usage low, the UI and animations were built keeping in mind that as much memory as possible on low-cost phones can be kept reserved for Energy Meter and player download buffers.
React Native UI runs in the JS layer, and hence to animate something on the UI thread at a smooth 60fps, constant communication between the JS thread and UI thread is necessary. To optimise this, all our animations are rendered using the useNativeDriver option of the Animated library within React Native.
Optimising Data Exchanges with Server
Lastly, the app pushes a lot of data to servers during the session. Additionally, it also expects data from the server. For instance, a friend’s current score. To optimise this, both for the app and server data exchanges are categorised as the following:
|Frequent up-to-date data exchanges||From the app, batch all such data and send in a single call every 10 secs.From the server, send back any necessary data for the app, in the response for the same call.||User score, user rank|
|Frequent data that can be reported with a delay||From the app, batch and send to the server every 30 seconds.||app and playback metrics|
|Static data||Reported to the server only once during class||User location, device ID|
So that was all about our journey till now. We have managed to make working-out at-home, using your phone fun and motivating.
Presently, with the abundance of workout content available online, we hope you find a friend in cult.live who pushes you to do better. We wish to take this vision further. And to do that, we are seeking ways to make the energy meter smarter by teaching it to recognise different movements and reward you with points accordingly. Also, the ability to count repetitive movements and understanding the concept of holding a pose – which is crucial for Yoga classes.
Similar to how cult centres proved that you do not need expensive gym equipment to have a fun group workout, we believe that cure.fit live classes can revolutionize the way you enjoy at-home workouts with just your phone.
See something you want to work on? Come help us build the future! See https://www.cure.fit/careers for opportunities.