FloSports’ February 2019 Engineering Report

New Live Web Experience 50/50 Test, VOD CDN switches and Team News

Truong-An Thai
FloSports Engineering

--

We’re kicking off our first ever public FloSports Engineering Report! Our aim is to highlight some of the awesome things our team has shipped, engineering improvements, team updates, and learnings — all from our Engineering lense.

The TL;DR

  • Excited to welcome our first Lead Android Developer! We also transitioned one of our Product-Engineers to the Infrastructure/DevOps team.
  • We created the v1 of our Engineering Project Health. Also transitioned to 100% Dev-Releases on our Platform Web/API projects.
  • Launched our new Live Web Experience 50/50 experiment
  • Zero downtime during one of our largest live streaming event ever!
  • We saw some high error rates after a deploy with our FE web app and had to rollback.
  • Saved a ton of money on VOD CDN delivery
  • Live Stream origins are now 100% running on our own in-house managed systems using AWS EC2s

Team Updates

New Hires and Role Changes

We’re excited that a new Engineer will join the team as our Lead Android developer. He’ll be working on our first ever FloSports Native Android app! Our customers, partners, and tribe members can’t wait for this to be released.

We’ve also been looking to backfill a DevOps Engineering role for a few months now. Guest what? We found an internal match! Samir spent the last 2+ years on our Platform API/CMS team (“The Annimals”) and will be joining the Infrastructure (DevOps) team effectively March 1st. We are enthusiastic about this change! We now have an opening for a Sr. Software/Backend Engineer.

Our Sr./Lead NOC Technician, Jack is moving on to new opportunities after solid almost 5 years run at FloSports. We’ll miss you, Jack! We’re now looking for a new NOC Lead/Manager.

Engineering Project Health

At the beginning of our Quarterly Q1–2019 planning, a few of us got together to identify the highest priority items. One of the initiatives that came out was tied to the technical health of our engineering projects. Kaleb got with some of our engineering project leads/devs to define the items around Quality, (DX) Developer Experience, Performance, Onboarding, and Measuring.

This is a great start to bring more visibility to improving the technical excellence of our projects, with shared ownership by each project contributor.

100% Dev Releases on our Platform API and Web FE Projects

Over the last year, we’ve transitioned from a culture of developers “dev completing” their work and then sending it off to our QA team for releasing. We went from QA releasing every day to “Dev Releases” 3/5 days a week. In February, Kodiak announced that we are going 100% Dev Releases!

This has enabled our QA team members to focus more on featured testing, acting as a test consultant for developers and more time to create and manage automated tests.

NEW Live Web Experience 50/50 Experiment Launched!

Our team is super excited to finally put our new Live Web Experience into the hands of our users. Since February, half of our users have been bucketed into the control (old experience) and the other half into the treatment (new experience).

As part of learning and measuring, we’ll be looking at various quality metrics such as Play Failures (%), EBVS (%), Join Time, Buffer Ratio, In-Stream Failures (#) and In-Stream Failures (%). Already seeing a ton of QoS Improvements!

Super awesome work by Patrick, Matt, and Chris over many months building the new experience on Angular, TypeScript, RxJS, NodeJS and hls.js.

Oh, for kicks, here is a screenshot of what the previous Live Web Experience looks like. Built using BackBone.js +Marionette, using JWPlayer and hosted on a static S3 page about 3 years back.

Fist Bump: We had Zero Downtime During our Largest Live Streaming Event Ever!

FloSports typically stream hundreds of events on any given weekend and most events are full-day long with users spread out thru one or more days. The 2019 Iowa at Oklahoma State Wrestling event on February 24th started with the #1 and #2 most anticipated matches happening right at 2 PM CST. This meant that every FloWrestling viewer will hit the site instantly at the same time to login or signup.

We serviced all login requests within the 1-hour window at an average of 108ms.

We owe the zero downtime on the application and infrastructure side to a few reasons. We previously failed in an embarrassing way for a similarly large event 2 years ago. Never Forget. Since then we’ve improved our process & tech a ton.

Process: Christian Pyles and Ray Machuca from the content team gave us heads up on the big event weeks before. This gave us time to run prep, including identifying risk with new code deploys, review load testing numbers, evaluate cloud infrastructure limits, and make code optimizations to improve application performance.

Tech: Huge investments were made to our Live API 3.0 and LIVE CDN integrations many months ago and we finally got to battle test it. We’ve also built load testing systems to measure our throughput on various services. Additionally, we’ve engineered our application and infrastructure allowing us to easily scale servers and databases on demand without any downtime.

But Something Did Happen a Few Days Before the Big Event…

While the big live event day went super smooth on the application and infra side, we did have an issue just 2 days before.

The Platform FE, our webapp, deployed a change which broke pages initially loaded with query params causing the pages to “hang” and eventually 503 error.

New Relic — 3.0 Frontend Production showed a HUGE spike in errors and load times after the deploy.

We were able to rollback shortly which resolved the production user impacting issue.

These production issues will happen. By conducting “blameless postmortems” or actionable retrospectives at FloSports, we foster a culture that aims to improve our systems quickly and encourage those involved to share information about issues without fear. Notes from our postmortem for this incident below:

Estimated Impact to Business:

  • Ad Campaigns — spent money on pages that did not load; Marketing Campaigns — Social, Twitter, Email; Facebook links; MileSplit login & signup (uses redirect query params)

Root Cause:

  • Site Logic using Query Params introduced; Platform FE Deploy did not follow the Deployment Process; Platform FE was not deployed to staging, but directly to Production; Platform FE GhostInspector tests were not run against the code before it was deployed to Production

Recovery & Resolution

  • Rollback Platform FE & reverting offending code

Major Cost Savings: VOD CDN and In-House Live Origins

At the beginning of January, since we got off Ooyala as our OVP provider and built our own in-house Video Uploader service, we’re now able to control which CDN we can partner with. Jason has been talking with various CDN providers like Fastly and Stackpath since Cloudfront was getting pretty expensive.

Kaleb was a big part of doing the POC work to connect our application VOD delivery to Fastly’s CDN including support from Matt and Josh on our Infra team.

We ended saving nearly 10x on the CDN costs. However, our cost for Data Transfer Out from AWS S3 to the CDN partner has increased. Still, massive savings!

Live Update by Niels, our Live Product Manager/Solutions Architect:

The last of our ScaleEngine origins will be decommissioned tonight. Starting tomorrow March 1 all of our events will be streamed by our in-house Flo managed EC2 origins (nickname Florigins).

This was a year-long initiative broken into 3 phases. Strong work by our engineers Steve, Karl, Jen, Samir, Nick Schirmer!

Not only is this a major cost-savings initiatives, we now have full control over our live-streaming destiny :)

Wow, A Lot Happens Nowadays at FloSports Engineering!

Our team is quite happy with how the year is going so far. We actually got a handful of product-engineering projects and initiatives in progress that will be revealed in upcoming months. Looking forward to March and April!

So what do you think of this Engineering report? Do you want to see more of these? What would you like us to share more? Let us know.

--

--