Continuous Deployment at : A Tale of Two Approaches Ross Snyder
[email protected] ro
[email protected] @beamrider9
March 9, 2013
A quick primer on
is: The global marketplace we make together.
is: The premier destination for handmade goods, vintage items, and craft supplies.
simplertimestoys
lacklusterco
norwesterseaglass
quick facts:
(as of March 2013)
• 22+ million members • 800,000+ active shops • 18+ million items currently for sale • 20 cents to list item, 3.5% transaction fee • 400+ employees (majority in Brooklyn)
Since opening its doors in June 2005, Etsy has grown virtually non-stop. $1,000 $800 $600 $400 $200 $0
2005
2006
2007
2008
2009
2010
2011
Gross Merchandise Sales ($MM)
2012
A nice problem to have: “Our site is so successful, how can we move fast enough to keep up with demand?”
CONTINUOUS DEPLOYMENT
: The Early Years (2005 - 2008)
: The Early Years
1. Spend significant time writing code
: The Early Years 1. Weeks writing code
2. Painful source control merge
: The Early Years 1. Weeks writing code 2. Painful merge
3. Hand off to someone else to deploy
: The Early Years 1. Weeks writing code 2. Painful merge 3. Hand off to deployers
4. Deploy, site goes down
: The Early Years 1. Weeks writing code 2. Painful merge 3. Hand off to deployers
4. Deploy, site down
5. Roll back deploy
: The Early Years 1. Weeks writing code 2. Painful merge 3. Hand off to deployers
4. Deploy, site down 5. Roll back deploy
6. Spend hours (days?) fixing bugs
: The Early Years 1. Weeks writing code 2. Painful merge 3. Hand off to deployers
4. Deploy, site down 5. Roll back deploy 6. Fix bugs
7. Go back to step 2
: The Early Years
WATERFALL!
: The Early Years Pros: Early Etsy engineers used this release cycle to bootstrap the marketplace from nothing. Forever grateful.
: The Early Years Cons: • Large changesets • Infrequent deploys • Weak confidence in deploy success • Significant time spent deploying • Low ability to experiment/ite experiment/iterate/react rate/react • Developer stress/unhap stress/unhappiness piness
: The Early Years By late 2008, Etsy is still a startup, but has the deploy process of a much bulkier company.
Popularity is on the verge of outpacing capacity.
: Today
: Today
1. Small changesets, deployed frequently
: Today 1. Small changesets
2. Engineers deploy the site
: Today And not just engineers, but also: • Designers • Product Folks • Upper Management • Board Members • Dogs
: Today 1. Small changesets 2. Engineers deploy
3. Deploys are fast and near-effortless
: Today 1. Small changesets 2. Engineers deploy 3. Deploys are fast
4. Most changes behind config flags (safer deploys)
: Today 1. Small changesets 2. Engineers deploy 3. Deploys are fast
4. Changes behind flags
5. Graphs/metrics to assess deploy
: Today 1. Small changesets 2. Engineers deploy 3. Deploys are fast
4. Changes behind flags 5. Copious graphs/metrics
6. If issues, fix immediately & roll forward
: Today This isn’t license to break stuff, quickly.
Engineer-driven QA and solid unit testing are integral parts of the process.
: Today 1. Small changesets 2. Engineers deploy 3. Deploys are fast
4. Changes behind flags 5. Copious graphs/metrics 6. Fix fast & roll forward
7. Repeat 25+ times per day, every day
Then: 1. Weeks writing code 2. Painful merge 3. Hand off to deployers 4. Deploy, site down 5. Roll back deploy 6. Fix bugs, go to step 2
Now: 1. Small changesets 2. Engineers deploy 3. Deploys are fast 4. Changes behind flags 5. Copious graphs/metrics 6. Fix fast & roll forward
Etsy Deploy Stats: 2012 • Deployed to production 6,419 times • On average, 535 /month, 25 /day • Additional 3,851 config-only deploys • 196 different people deployed to prod • Nov/Dec 2012: deployed 752 times
Why does it work?
Continuous Deployment Math • N = # of deploys • P = probability of site degradation • S = average severity of degradation • T = time to detect/resolve
Expected = N*P*S*T Downtime
Continuous Deployment Math N = # of deploys P = prob. of degradation
Before:
S = avg. severity of degradation T = time to detect/resolve
Now:
•N=1
• N = 250 ↑↑↑↑
• P = 0.5
• P = 0.1 ↓
• S = 0.7
• S = 0.05 ↓↓
• T = 100
• T = 5 ↓↓↓
E.D. = 35
E.D. = 6.25
(all numbers completely arbitrary)
Big Takeaway Etsy circa 2013 (400+ (400+ employees) acts, in some ways, more like a startup than Etsy circa 2008 (40+ (40+ employees).
Continuous Deployment makes possible: “Continuous Experimentati Experimentation” on”
http://etsy.me/continuous-experimentation
Continuous Experimentation 1. Small changes 2. Run experiment (A/B test) 3. Analyze data 4. Re-examine assumptions
Repeat continuously in pursuit of larger goals.
Heard since 2010: “Neat experiment, but this will never scale.”
As of 2013, Etsy has 100+ engineers still going strong.
Some Etsy Customizations
Deploying is a first-class feature. Inability to deploy is a P1 incident (same as site down).
Some Etsy Customizations We continuously deploy not just the main Etsy website, but as much as possible: • Internal admin site • API • Big data • Search • Blog • Deployinato Deployinatorr itself
Some Etsy Customizations In the rare case we can’t continuously deploy, we create alternative tools: • Database schema changes • PCI-DSS environment (credit cards) We do continuously deploy as much of our payment processing as is safe & legal (98%).
Some Etsy Customizations
Keeping deploys fast is paramount and worth the investment in manpower & hardware.
Some Etsy Customizations Continuous deployment is all about moving forward, sometimes at the expense of the past. Our solution: engineering-wide bug rotation, one day a month, every engineer participates.
Fun Fact: Continuous Deployment is a fantastic recruitment tool for attracting engineers who like to move fast and get stu ff done.
Learn more: http://codeascraft.etsy.com/ Etsy open source (Deployinato (Deployinator, r, StatsD) http://etsy.github.com/ Join the fun: http://www.etsy.com/careers