3 minutes
Graviton lurking
lurking in @lizthegrey and @QuinnyPig's Slack discussion re: Graviton2 instance availability and AWS's ability to shift capacity
— shelby reliability engineer (@shelbyspees) August 24, 2020
I'd feel like a creep but honestly this is how I learn fastest, just being immersed among experts
I've never had to think much about hardware. It's usually been abstracted away from me or already decided upon before I joined a team.
— shelby reliability engineer (@shelbyspees) August 24, 2020
Honeycomb is unique because it was founded by a team of ops and infra experts, and those roots continue to influence our priorities.
Most apps and services don't need to be on the bleeding edge of hardware technology.
— shelby reliability engineer (@shelbyspees) August 24, 2020
Most apps and services don't involve implementing your own distributed column store and query engine, though.
The team also didn't start out on the bleeding edge of hardware back in 2016.
— shelby reliability engineer (@shelbyspees) August 24, 2020
As a general rule, we use what works and what we know until that's no longer serving us, and then we figure out the highest-impact change to improve performance.
No resume-driven development here.
I know @lizthegrey loves this work, but she wouldn't be investing in these hardware experiments if it didn't have the potential for huge performance and scalability gains.https://t.co/3rux5DHuJp
— shelby reliability engineer (@shelbyspees) August 24, 2020
Matt Button responded with a question I shared some of that perspective.
I mean I've never even thought about AWS running out of an instance type before. I suppose I knew it was possible in theory but like, idk where they come from 🤷♀️ the stork brings them 😅
— shelby reliability engineer (@shelbyspees) August 25, 2020
But hearing that "AWS can shift capacity on a scale of minutes to hours" and apparently GCP takes days? I can start to connect the dots about what happens behind the scenes.
— shelby reliability engineer (@shelbyspees) August 25, 2020
I've never been in a server room (wow take away my ops card) but I've seen them on TV. I get that there's a physical rack somewhere, but EC2 instances are VMs so I always figured one rack thingy gave you more than one VM.
— shelby reliability engineer (@shelbyspees) August 25, 2020
This may be different for beefier instance types though.
I'm still at about this level of understanding:
— shelby reliability engineer (@shelbyspees) August 25, 2020
mount rack hardware -> ??? -> VMs available
It's been ages since I created an EC2 instance in the console, poking around the options there can probably tell me a lot. (Probably won't get to that today though.)
Now I'm wondering how spot instances work under the hood.
— shelby reliability engineer (@shelbyspees) August 25, 2020
Liz chimed in with her expertise as well:
yup, for m6g, .metal == .16xlarge, so you can get 16 xlarges out of one physical host, or 8 .2xlarge, etc
— Liz Fong-Jones (方禮真) (@lizthegrey) August 25, 2020
Cost, too. Getting opex and COGS under control is essential to operating a healthy SaaS business.
— Liz Fong-Jones (方禮真) (@lizthegrey) August 24, 2020
Unspoken assumption, but 💯 definitely cost is a major factor
— shelby reliability engineer (@shelbyspees) August 25, 2020
538 Words
2020-08-24 18:36 (Last updated: 2020-09-20 08:36)
03329a4
@
2020-09-20