[{"data":1,"prerenderedAt":759},["ShallowReactive",2],{"/en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes":3,"navigation-en-us":36,"banner-en-us":464,"footer-en-us":481,"Matt Smiley":725,"next-steps-en-us":738,"footer-source-/en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes/":753},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"seo":8,"content":16,"config":26,"_id":29,"_type":30,"title":31,"_source":32,"_file":33,"_stem":34,"_extension":35},"/en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes","blog",false,"",{"title":9,"description":10,"ogTitle":9,"ogDescription":10,"noIndex":6,"ogImage":11,"ogUrl":12,"ogSiteName":13,"ogType":14,"canonicalUrls":12,"schema":15},"How we diagnosed and resolved Redis latency spikes with BPF and other tools","How we uncovered a three-phase cycle involving two distinct saturation points and a simple fix to break that cycle.","https://res.cloudinary.com/about-gitlab-com/image/upload/v1749667913/Blog/Hero%20Images/clocks.jpg","https://about.gitlab.com/blog/how-we-diagnosed-and-resolved-redis-latency-spikes","https://about.gitlab.com","article","\n                        {\n        \"@context\": \"https://schema.org\",\n        \"@type\": \"Article\",\n        \"headline\": \"How we diagnosed and resolved Redis latency spikes with BPF and other tools\",\n        \"author\": [{\"@type\":\"Person\",\"name\":\"Matt Smiley\"}],\n        \"datePublished\": \"2022-11-28\",\n      }",{"title":9,"description":10,"authors":17,"heroImage":11,"date":19,"body":20,"category":21,"tags":22},[18],"Matt Smiley","2022-11-28","If you enjoy performance engineering and peeling back abstraction layers to\nask underlying subsystems to explain themselves, this article’s for you. The\ncontext is a chronic Redis latency problem, and you are about to tour a\npractical example of using BPF and profiling tools in concert with standard\nmetrics to reveal unintuitive behaviors of a complex system.\n\n\nBeyond the tools and techniques, we also use an iterative hypothesis-testing\napproach to compose a behavior model of the system dynamics. This model\ntells us what factors influence the problem's severity and triggering\nconditions.\n\n\nUltimately, we find the root cause, and its remedy is delightfully boring\nand effective. We uncover a three-phase cycle involving two distinct\nsaturation points and a simple fix to break that cycle. Along the way, we\ninspect aspects of the system’s behavior using stack sampling profiles, heat\nmaps and flamegraphs, experimental tuning, source and binary analysis,\ninstruction-level BPF instrumentation, and targeted latency injection under\nspecific entry and exit conditions.\n\n\nIf you are short on time, the takeaways are summarized at the end. But the\njourney is the fun part, so let's dig in!\n\n\n## Introducing the problem: Chronic latency\n\n\nGitLab makes extensive use of Redis, and, on GitLab.com SaaS, we use\n[separate Redis\nclusters](https://handbook.gitlab.com/handbook/engineering/infrastructure/production/architecture/#redis-architecture)\nfor certain functions. This tale concerns a Redis instance acting\nexclusively as a least recently used (LRU) cache.\n\n\nThis cache had a chronic latency problem that started occurring\nintermittently over two years ago and in recent months had become\nsignificantly worse: Every few minutes, it suffered from bursts of very high\nlatency and corresponding throughput drop, eating into its Service Level\nObjective (SLO). These latency spikes impacted user-facing response times\nand [burned error\nbudgets](https://gitlab.com/gitlab-org/gitlab/-/issues/360578#note_966597336)\nfor dependent features, and this is what we aimed to solve.\n\n\n**Graph:** Spikes in the rate of extremely slow (1 second) Redis requests,\neach corresponding to an eviction burst\n\n\n![Graph showing spikes in the slow request rate every few\nminutes](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/00_redis_slow_request_rate_spikes_during_each_eviction_burst.png)\n\n\nIn prior work, we had already completed several mitigating optimizations.\nThese sufficed for a while, but organic growth had resurfaced this as an\nimportant [long-term scaling\nproblem](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#why-is-it-important-to-get-to-the-root-of-the-latency-spikes).\nWe had also already ruled out externally triggered causes, such as request\nfloods, connection rate spikes, host-level resource contention, etc. These\nlatency spikes were consistently associated with memory usage reaching the\neviction threshold (`maxmemory`), not by changes in client traffic patterns\nor other processes competing with Redis for CPU time, memory bandwidth, or\nnetwork I/O.\n\n\nWe [initially\nthought](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1567)\nthat Redis 6.2’s new [eviction throttling\nmechanism](https://github.com/redis/redis/pull/7653) might alleviate our\neviction burst overhead. It did not. That mechanism solves a different\nproblem: It prevents a stall condition where a single call to\n`performEvictions` could run arbitrarily long. In contrast, during this\nanalysis we\n[discovered](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_977816216)\nthat our problem (both before and after upgrading Redis) was related to\nnumerous calls collectively reducing Redis throughput, rather than a few\nextremely slow calls causing a complete stall.\n\n\nTo discover our bottleneck and its potential solutions, we needed to\ninvestigate Redis’s behavior during our workload’s eviction bursts.\n\n\n## A little background on Redis evictions\n\n\nAt the time, our cache was oversubscribed, trying to hold more cache keys\nthan the [configured `maxmemory`\nthreshold](https://redis.io/docs/reference/eviction/) could hold, so\nevictions from the LRU cache were expected. But the dense concentration of\nthat eviction overhead was surprising and troubling.\n\n\nRedis is essentially single-threaded. With a few exceptions, the “main”\nthread does almost all tasks serially, including handling client requests\nand evictions, among other things. Spending more time on X means there is\nless remaining time to do Y, so think about queuing behavior as the story\nunfolds.\n\n\nWhenever Redis reaches its `maxmemory` threshold, it frees memory by\nevicting some keys, aiming to do just enough evictions to get back under\n`maxmemory`. However, contrary to expectation, the metrics for memory usage\nand eviction rate (shown below) indicated that instead of a continuous\nsteady eviction rate, there were abrupt burst events that freed much more\nmemory than expected. After each eviction burst, no evictions occurred until\nmemory usage climbed back up to the `maxmemory` threshold again.\n\n\n**Graph:** Redis memory usage drops by 300-500 MB during each eviction\nburst:\n\n\n![Memory usage repeatedly rises gradually to 64 GB and then abruptly\ndrops](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/01_redis_memory_usage_dips_during_eviction_bursts.png)\n\n\n**Graph:** Key eviction spikes match the timing and size of the memory usage\ndips shown above\n\n\n![Eviction counter shows a large spike each time the previous graph showed a\nlarge memory usage\ndrop](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/02_redis_eviction_bursts.png)\n\n\nThis apparent excess of evictions became the central mystery. Initially, we\nthought answering that question might reveal a way to smooth the eviction\nrate, spreading out the overhead and avoiding the latency spikes. Instead,\nwe discovered that these bursts are an interaction effect that we need to\navoid, but more on that later.\n\n\n## Eviction bursts cause CPU saturation\n\n\nAs shown above, we had found that these latency spikes correlated perfectly\nwith large spikes in the cache’s eviction rate, but we did not yet\nunderstand why the evictions were concentrated into bursts that last a few\nseconds and occur every few minutes.\n\n\nAs a first step, we wanted to verify a causal relationship between eviction\nbursts and latency spikes.\n\n\nTo test this, we used [`perf`](https://www.brendangregg.com/perf.html) to\nrun a CPU sampling profile on the Redis main thread. Then we applied a\nfilter to split that profile, isolating the samples where it was calling the\n[`performEvictions`\nfunction](https://github.com/redis/redis/blob/6.2.6/src/evict.c#L512). Using\n[`flamescope`](https://github.com/Netflix/flamescope), we can visualize the\nprofile’s CPU usage as a [subsecond offset heat\nmap](https://www.brendangregg.com/HeatMaps/subsecondoffset.html), where each\nsecond on the X axis is folded into a column of 20 msec buckets along the Y\naxis. This visualization style highlights sub-second activity patterns.\nComparing these two heat maps confirmed that during an eviction burst,\n`performEvictions` is starving all other main thread code paths for CPU\ntime.\n\n\n**Graph:** Redis main thread CPU time, excluding calls to `performEvictions`\n\n\n![Heat map shows one large gap and two small gaps in an otherwise uniform\npattern of 70 percent to 80 percent CPU\nusage](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/03_heat_map_of_redis_main_thread_during_eviction_burst__excluding_performEvictions.png)\n\n\n**Graph:** Remainder of the same profile, showing only the calls to\n`performEvictions`\n\n\n![This heat map shows the gaps in the previous heap map were CPU time spent\nperforming\nevictions](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/04_heat_map_of_redis_main_thread_during_eviction_burst__only_performEvictions.png)\n\n\nThese results confirm that eviction bursts are causing CPU starvation on the\nmain thread, which acts as a throughput bottleneck and increases Redis’s\nresponse time latency.  These CPU utilization bursts typically lasted a few\nseconds, so they were too short-lived to trigger alerts but were still user\nimpacting.\n\n\nFor context, the following flamegraph shows where `performEvictions` spends\nits CPU time. There are a few interesting things here, but the most\nimportant takeaways are:\n\n* It gets called synchronously by `processCommand` (which handles all client\nrequests).\n\n* It handles many of its own deletes. Despite its name, the `dbAsyncDelete`\nfunction only delegates deletes to a helper thread under certain conditions\nwhich turn out to be rare for this workload.\n\n\n![Flamegraph of calls to function performEvictions, as described\nabove](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/05_flamegraph_of_redis_main_thread_during_eviction_burst__only_performEvictions.png)\n\n\nFor more details on this analysis, see the [walkthrough and\nmethodology](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_854745083).\n\n\n## How fast are individual calls to `performEvictions`?\n\n\nEach incoming request to Redis is handled by a call to `processCommand`, and\nit always concludes by calling the `performEvictions` function. That call to\n`performEvictions` is frequently a no-op, returning immediately after\nchecking that the `maxmemory` threshold has not been breached. But when the\nthreshold is exceeded, it will continue evicting keys until it either\nreaches its `mem_tofree` goal or exceeds its configured time limit per call.\n\n\nThe CPU heat maps shown earlier proved that `performEvictions` calls were\ncollectively consuming a large majority of CPU time for up to several\nseconds.\n\n\nTo complement that, we also measured the wall clock time of individual\ncalls.\n\n\nUsing the `funclatency` CLI tool (part of the [BCC suite of BPF\ntools](https://github.com/iovisor/bcc)), we measured call duration by\ninstrumenting entry and exit from the `performEvictions` function and\naggregated those measurements into a\n[histogram](https://en.wikipedia.org/wiki/Histogram) at 1-second intervals.\nWhen no evictions were occurring, the calls were consistently low latency\n(4-7 usecs/call). This is the no-op case described above (including 2.5\nusecs/call of instrumentation overhead). But during an eviction burst, the\nresults shift to a bimodal distribution, including a combination of the fast\nno-op calls along with much slower calls that are actively performing\nevictions:\n\n\n```\n\n$ sudo funclatency-bpfcc --microseconds --timestamp --interval 1 --duration\n600 --pid $( pgrep -o redis-server )\n'/opt/gitlab/embedded/bin/redis-server:performEvictions'\n\n...\n\n23:54:03\n     usecs               : count     distribution\n         0 -> 1          : 0        |                                        |\n         2 -> 3          : 576      |************                            |\n         4 -> 7          : 1896     |****************************************|\n         8 -> 15         : 392      |********                                |\n        16 -> 31         : 84       |*                                       |\n        32 -> 63         : 62       |*                                       |\n        64 -> 127        : 94       |*                                       |\n       128 -> 255        : 182      |***                                     |\n       256 -> 511        : 826      |*****************                       |\n       512 -> 1023       : 750      |***************                         |\n```\n\n\nThis measurement also directly confirmed and quantified the throughput drop\nin Redis requests handled per second: The call rate to `performEvictions`\n(and hence to `processCommand`) dropped to 20% of its norm from before the\nevictions began, from 25K to 5K calls per second.\n\n\nThis has a huge impact on clients: New requests are arriving at 5x the rate\nthey are being completed. And crucially, we will see soon that this\nasymmetry is what drives the eviction burst.\n\n\nFor more details on this analysis, see the [safety\ncheck](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_857869826)\nfor instrumentation overhead and the [results\nwalkthrough](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_857907521).\nAnd for more general reference, the BPF instrumentation overhead estimate is\nbased on these [benchmark\nresults](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1383).\n\n\n## Experiment: Can tuning mitigate eviction-driven CPU saturation?\n\n\nThe analyses so far had shown that evictions were severely starving the\nRedis main thread for CPU time. There were still important unanswered\nquestions (which we will return to shortly), but this was already enough\ninfo to [suggest some\nexperiments](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_859236777)\nto test potential mitigations:\n\n* Can we spread out the eviction overhead so it takes longer to reach its\ngoal but consumes a smaller percentage of the main thread’s time?\n\n* Are evictions freeing more memory than expected due to scheduling a lot of\nkeys to be asynchronously deleted by the [lazyfree\nmechanism](https://github.com/redis/redis/blob/6.2.6/redis.conf#L1079)?\nLazyfree is an optional feature that lets the Redis main thread [delegate to\nan async helper\nthread](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_859236777)\nthe expensive task of deleting keys that have more than 64 elements. These\nasync evictions do not count immediately towards the eviction loop’s memory\ngoal, so if many keys qualify for lazyfree, this could potentially drive\nmany extra iterations of the eviction loop.\n\n\nThe\n[answers](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7172#note_971197943)\nto both turned out to be no:\n\n* Reducing `maxmemory-eviction-tenacity` to its minimum setting still did\nnot make `performEvictions` cheap enough to avoid accumulating a request\nbacklog. It did increase response rate, but arrival rate still far exceeded\nit, so this was not an effective mitigation.\n\n* Disabling `lazyfree-lazy-eviction` did not prevent the eviction burst from\ndropping memory usage far below `maxmemory`. Those lazyfrees represent a\nsmall percentage of reclaimed memory. This rules out one of the potential\nexplanations for the mystery of excessive memory being freed.\n\n\nHaving ruled out two potential mitigations and one candidate hypothesis, at\nthis point we return to the pivotal question: Why are several hundred extra\nmegabytes of memory being freed by the end of each eviction burst?\n\n\n## Why do evictions occur in bursts and free too much memory?\n\n\nEach round of eviction aims to free just barely enough memory to get back\nunder the `maxmemory` threshold.\n\n\nWith a steady rate of demand for new memory allocations, the eviction rate\nshould be similarly steady. The rate of arriving cache writes does appear to\nbe steady. So why are evictions happening in dense bursts, rather than\nsmoothly? And why do they reduce memory usage on a scale of hundreds of\nmegabytes rather than hundreds of bytes?\n\n\nSome potential explanations to explore:\n\n* Do evictions only end when a large key gets evicted, spontaneously freeing\nenough memory to skip evictions for a while? No, the memory usage drop is\nfar bigger than the largest keys in the dataset.\n\n* Do deferred lazyfree evictions cause the eviction loop to overshoot its\ngoal, freeing more memory than intended? No, the above experiment disproved\nthis hypothesis.\n\n* Is something causing the eviction loop to sometimes calculate an\nunexpectedly large value for its `mem_tofree` goal? We explore this next.\nThe answer is no, but checking it led to a new insight.\n\n* Is a feedback loop causing evictions to become somehow self-amplifying? If\nso, what conditions lead to entering and leaving this state? This turned out\nto be correct.\n\n\nThese were all plausible and testable hypotheses, and each would point\ntowards a different solution to the eviction-driven latency problem.\n\n\nThe first two hypotheses we have already eliminated.\n\n\nTo test the next two, we built custom BPF instrumentation to peek at the\ncalculation of `mem_tofree` at the start of each call to `performEvictions`.\n\n\n## Observing the `mem_tofree` calculation with `bpftrace`\n\n\nThis part of the investigation was a personal favorite and led to a critical\nrealization about the nature of the problem.\n\n\nAs noted above, our two remaining hypotheses were:\n\n* an unexpectedly large `mem_tofree` goal\n\n* a self-amplifying feedback loop\n\n\nTo differentiate between them, we used\n[`bpftrace`](https://github.com/iovisor/bpftrace) to instrument the\ncalculation of `mem_tofree`, looking at its input variables and results.\n\n\nThis set of measurements directly tests the following:\n\n* Does each call to `performEvictions` aim to free a small amount of memory\n-- perhaps roughly the size of an average cache entry? If `mem_tofree` ever\napproaches hundreds of megabytes, that would confirm the first hypothesis\nand reveal what part of the calculation was causing that large value.\nOtherwise, it rules out the first hypothesis and makes the feedback loop\nhypothesis more likely.\n\n* Does the replication buffer size significantly influence `mem_tofree` as a\nfeedback mechanism? Each eviction adds to this buffer, just like normal\nwrites do. If this buffer grows large (possibly partly due to evictions) and\nthen abruptly shrinks (due to the peer consuming it), that would cause a\nspontaneous large drop in memory usage, ending evictions and instantly\nreducing memory usage. This is one potential way for evictions to drive a\nfeedback loop.\n\n\nTo peek at the values of the `mem_tofree` calculation\n([script](https://gitlab.com/gitlab-com/gl-infra/scalability/uploads/cab2cd03231f8dd4819f77b44d768cb9/redis_snoop.getMaxmemoryState.sha_25a228b839a93a1395907a03f83e1eee448b0f14.production_thresholds.bt)),\nwe needed to isolate the [correct call from\n`performEvictions`](https://github.com/redis/redis/blob/6.2.6/src/evict.c#L523)\nto the\n[`getMaxmemoryState`](https://github.com/redis/redis/blob/6.2.6/src/evict.c#L374-L407)\nfunction and reverse engineer its assembly to find the right instruction and\nregister to instrument for each of the source code level variables that we\nwanted to capture. From that data we generate histograms for each of the\nfollowing variables:\n\n\n```\n\nmem_reported = zmalloc_used_memory()        // All used memory tracked by\njemalloc\n\noverhead = freeMemoryGetNotCountedMemory()  // Replication output buffers +\nAOF buffer\n\nmem_used = mem_reported - overhead          // Non-exempt used memory\n\nmem_tofree = mem_used - maxmemory           // Eviction goal\n\n```\n\n\n_Caveat:_ Our [custom BPF\ninstrumentation](https://gitlab.com/gitlab-com/gl-infra/scalability/uploads/cab2cd03231f8dd4819f77b44d768cb9/redis_snoop.getMaxmemoryState.sha_25a228b839a93a1395907a03f83e1eee448b0f14.production_thresholds.bt)\nis specific to this particular build of the `redis-server` binary, since it\nattaches to virtual addresses that are likely to change the next time Redis\nis compiled. But the approach is able to be generalized. Treat this as a\nconcrete example of using BPF to inspect source code variables in the middle\nof a function call without having to rebuild the binary. Because we are\npeeking at the function’s intermediate state and because the compiler\ninlined this function call, we needed to do binary analysis to find the\ncorrect instrumentation points. In general, peeking at a function’s\narguments or return value is easier and more portable, but in this case it\nwould not suffice.\n\n\nThe results:\n\n* Ruled out the first hypothesis: Each call to `performEvictions` had a\nsmall target value (`mem_tofree` \u003C 2 MB). This means each call to\n`performEvictions` did a small amount of work. Redis’s mysterious rapid drop\nin memory usage cannot have been caused by an abnormally large `mem_tofree`\ntarget evicting a big batch of keys all at once. Instead, there must be many\ncalls collectively driving down memory usage.\n\n* The replication output buffers remained consistently small, ruling out one\nof the potential feedback loop mechanisms.\n\n* Surprisingly, `mem_tofree` was usually 16 KB to 64 KB, which is larger\nthan a typical cache entry. This size discrepancy hints that cache keys may\nnot be the main source of the memory pressure perpetuating the eviction\nburst once it begins.\n\n\nAll of the above results were consistent with the feedback loop hypothesis.\n\n\nIn addition to answering the initial questions, we got a bonus outcome:\nConcurrently measuring both `mem_tofree` and `mem_used` revealed a crucial\nnew fact – _the memory reclaim is a completely distinct phase from the\neviction burst_.\n\n\nReframing the pathology as exhibiting separate phases for evictions versus\nmemory reclaim led to a series of realizations, described in the next\nsection. From that emerged a coherent hypothesis explaining all the observed\nproperties of the pathology.\n\n\nFor more details on this analysis, see [methodology\nnotes](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982498636),\n[build\nnotes](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982499538)\nsupporting the disassembly of the Redis binary, and [initial\ninterpretations](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_977994182).\n\n\n## Three-phase cycle\n\n\nWith the above results indicating a distinct separation between the\nevictions and the memory reclaim, we can now concisely characterize [three\nphases](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982623949)\nin the cycle of eviction-driven latency spikes.\n\n\n**Graph:** Diagram (not to scale) comparing memory and CPU usage to request\nand response rates during each of the three phases\n\n\n![Diagram summarizes the text that follows, showing CPU and memory saturate\nin Phase 2 until request rate drops to match response rate, after which they\nrecover](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/06_3_phase_cycle_of_eviction_bursts.png)\n\n\nPhase 1: Not saturated (7-15 minutes)\n\n* Memory usage is below `maxmemory`. No evictions occur during this phase.\n\n* Memory usage grows organically until reaching `maxmemory`, which starts\nthe next phase.\n\n\nPhase 2: Saturated memory and CPU (6-8 seconds)\n\n* When memory usage reaches `maxmemory`, evictions begin.\n\n* Evictions occur only during this phase, and they occur intermittently and\nfrequently.\n\n* Demand for memory frequently exceeds free capacity, repeatedly pushing\nmemory usage above `maxmemory`. Throughout this phase, memory usage\noscillates close to the `maxmemory` threshold, evicting a small amount of\nmemory at a time, just enough to get back under `maxmemory`.\n\n\nPhase 3: Rapid memory reclaim (30-60 seconds)\n\n* No evictions occur during this phase.\n\n* During this phase, something that had been holding a lot of memory starts\nquickly and steadily releasing it.\n\n* Without the overhead of running evictions, CPU time is again spent mostly\non handling requests (starting with the backlog that accumulated during\nPhase 2).\n\n* Memory usage drops rapidly and steadily. By the time this phase ends,\nhundreds of megabytes have been freed. Afterwards, the cycle restarts with\nPhase 1.\n\n\nAt the transition between Phase 2 and Phase 3, evictions abruptly ended\nbecause memory usage stays below the `maxmemory` threshold.\n\n\nReaching that transition point where memory pressure becomes negative\nsignals that whatever was driving the memory demand in Phase 2 has started\nreleasing memory faster than it is consuming it, shrinking the footprint it\nhad accumulated during the previous phase.\n\n\nWhat is this **mystery memory consumer** that bloats its demand during Phase\n2 and frees it during Phase 3?\n\n\n## The mystery revealed\n\n\n[Modeling the phase\ntransitions](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982651298)\ngave us some useful constraints that a viable hypothesis must satisfy. The\nmystery memory consumer must:\n\n* quickly bloat its footprint to hundreds of megabytes on a timescale of\nless than 10 seconds (the duration of Phase 2), under conditions triggered\nby the start of an eviction burst\n\n* quickly release its accumulated excess on a timescale of just tens of\nseconds (the duration of Phase 3), under the conditions immediately\nfollowing an eviction burst\n\n\n**The answer:** The client input/output buffers meet those constraints to be\nthe mystery memory consumer.\n\n\nHere is how that hypothesis plays out:\n\n* During Phase 1 (healthy state), the Redis main thread’s CPU usage is\nalready fairly high. At the start of Phase 2, when evictions begin, the\neviction overhead saturates the main thread’s CPU capacity, quickly dropping\nresponse rate below the incoming request rate.\n\n* This throughput mismatch between arrivals versus responses **is itself the\namplifier** that takes over driving the eviction burst. As the size of that\nrate gap increases, the proportion of time spent doing evictions also\nincreases.\n\n* Accumulating a backlog of requests requires memory, and that backlog\ncontinues to grow until enough clients are stalled that the arrival rate\ndrops to match the response rate. As clients stall, the arrival rate falls,\nand with it the memory pressure, eviction rate, and CPU overhead begin to\nreduce.\n\n* At the equilibrium point when arrival rate falls to match response rate,\nmemory demand is satisfied and evictions stop (ending Phase 2). Without the\neviction overhead, more CPU time is available to process the backlog, so\nresponse rate increases above request arrival rate. This recovery phase\nsteadily consumes the request backlog, incrementally freeing memory as it\ngoes (Phase 3).\n\n* Once the backlog is resolved, the arrival and response rates match again.\nCPU usage is back to its Phase 1 norm, and memory usage has temporarily\ndropped in proportion to the max size of Phase 2’s request backlog.\n\n\nWe confirmed this hypothesis via a [latency injection\nexperiment](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_987049036)\nshowing that queuing alone explains the pathology. This outcome supports the\nconclusion that the extra memory demand originates from response rate\nfalling below request arrival rate.\n\n\n## Remedies: How to avoid entering the eviction burst cycle\n\n\nNow that we understand the dynamics of the pathology, we can draw confident\nconclusions about viable solutions.\n\n\nRedis evictions are only self-amplifying when all of the following\nconditions are present:\n\n* **Memory saturation:** Memory usage reaches the `maxmemory` limit, causing\nevictions to start.\n\n* **CPU saturation:** The baseline CPU usage by the Redis main thread’s\nnormal workload is close enough to a whole core that the eviction overhead\npushes it to saturation. This reduces the response rate below request\narrival rate, inducing self-amplification via increased memory demand for\nrequest buffering.\n\n* **Many active clients:** The saturation only lasts as long as request\narrival rate exceeds response rate. Stalled clients no longer contribute to\nthat arrival rate, so the saturation lasts longer and has a greater impact\nif Redis has many active clients still sending requests.\n\n\nViable remedies include:\n\n* Avoid memory saturation by any combination of the following to make peak\nmemory usage less than the `maxmemory` limit:\n  * Reduce cache time to live (TTL)\n  * Increase `maxmemory` (and host memory if needed, but watch out for [`numa_balancing` CPU overhead](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1889) on hosts with multiple NUMA nodes)\n  * Adjust client behavior to avoid writing unnecessary cache entries\n  * Split the cache among multiple instances (sharding or functional partitioning, helps avoid both memory and CPU saturation)\n* Avoid CPU saturation by any combination of the following to make peak CPU\nusage for the workload plus eviction overhead be less than 1 CPU core:\n  * Use the fastest processor available for single-threaded instructions per second\n  * Isolate the redis-server process (particularly its main thread) from any other competing CPU-intensive processes (dedicated host, taskset, cpuset)\n  * Adjust client behavior to avoid unnecessary cache lookups or writes\n  * Split the cache among multiple instances (sharding or functional partitioning, helps avoid both memory and CPU saturation)\n  * Offload work from the Redis main thread (io-threads, lazyfree)\n  * Reduce eviction tenacity (only gives a minor benefit in our experiments)\n\nMore exotic potential remedies could include a new Redis feature. One idea\nis to exempt ephemeral allocations like client buffers from counting towards\nthe `maxmemory` limit, instead applying that limit only to key storage.\nAlternatively, we could limit evictions to only consume at most a\nconfigurable percentage of the main thread’s time, so that most of its time\nis still spent on request throughput rather than eviction overhead.\n\n\nUnfortunately, either of those features would trade one failure mode for\nanother, reducing the risk of eviction-driven CPU saturation while\nincreasing the risk of unbounded memory growth at the process level, which\ncould potentially saturate the host or cgroup and lead to an OOM, or out of\nmemory, kill. That trade-off may not be worthwhile, and in any case it is\nnot currently an option.\n\n\n## Our solution\n\n\nWe had already exhausted the low-hanging fruit for CPU efficiency, so we\nfocused our attention on avoiding memory saturation.\n\n\nTo improve the cache’s memory efficiency, we\n[evaluated](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_990891708)\nwhich types of cache keys were using the most space and how much\n[`IDLETIME`](https://redis.io/commands/object-idletime/) they had accrued\nsince last access. This memory usage profile identified some rarely used\ncache entries (which waste space), helped inform the TTL, or time to live,\ntuning by first focusing on keys with a high idle time, and highlighted some\nuseful potential cutpoints for functionally partitioning the cache.\n\n\nWe\n[decided](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_1014582669)\nto concurrently pursue several cache efficiency improvements and opened an\n[epic](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/764) for it.\nThe goal was to avoid chronic memory saturation, and the main action items\nwere:\n\n* Iteratively reduce the cache’s [default\nTTL](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1854) from\n2 weeks to 8 hours (helped a lot!)\n\n* Switch to [client-side\ncaching](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_1026821730)\nfor certain cache keys (efficiently avoids spending shared cache space on\nnon-shared cache entries)\n\n* [Partition](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/762) a\nset of cache keys to a separate Redis instance\n\n\nThe TTL reduction was the simplest solution and turned out to be a big win.\nOne of our main concerns with TTL reduction was that the additional cache\nmisses could potentially increase workload on other parts of the\ninfrastructure. Some cache misses are more expensive than others, and our\nmetrics are not granular enough to quantify the cost of cache misses per\ntype of cache entry. This concern is why we applied the TTL adjustment\nincrementally and monitored for SLO violations. Fortunately, our inference\nwas correct: Reducing TTL did not significantly reduce the cache hit rate,\nand the additional cache misses did not cause noticeable impact to\ndownstream subsystems.\n\n\nThe TTL reduction turned out to be sufficient to drop memory usage\nconsistently a little below its saturation point.\n\n\nIncreasing `maxmemory` had initially not been feasible because the original\npeak memory demand (prior to the efficiency improvements) was expected to be\nlarger than the max size of the VMs we use for Redis. However, once we\ndropped memory demand below saturation, then we could confidently [provision\nheadroom](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1868)\nfor future growth and re-enable [saturation\nalerting](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1883).\n\n\n## Results\n\n\nThe following graph shows Redis memory usage transitioning out of its\nchronically saturated state, with annotations describing the milestones when\nlatency spikes ended and when saturation margin became wide enough to be\nconsidered safe:\n\n\n![Redis memory usage stops showing a flat top\nsaturation](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/07_epic_results__memory_saturation_avoided_by_TTL_reductions.png)\n\n\nZooming into the days when we rolled out the TTL adjustments, we can see the\nharmful eviction-driven latency spikes vanish as we drop memory usage below\nits saturation point, exactly as predicted:\n\n\n![Redis memory usage starts as a flat line and then falls below that\nsaturation\nline](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/08_results__redis_memory_usage_stops_saturating.png)\n\n\n![Redis response time spikes stop occurring at the exact point when memory\nstops being\nsaturated](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/09_results__redis_latency_spikes_end.png)\n\n\nThese eviction-driven latency spikes had been the biggest cause of slowess\nin Redis cache.\n\n\nSolving this source of slowness significantly improved the user experience.\nThis 1-year lookback shows only the long-tail portion of the improvement,\nnot even the full benefit.  Each weekday had roughly 2 million Redis\nrequests slower than 1 second, until our fix in mid-August:\n\n\n![Graph of the daily count of Redis cache requests slower than 1 second,\nshowing roughly 2 million slow requests per day on weekdays until\nmid-August, when the TTL adjustments were\napplied](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/10_results__1_year_retrospective_of_slow_redis_requests_per_day.png)\n\n\n## Conclusions\n\n\nWe solved a long-standing latency problem that had been worsening as the\nworkload grew, and we learned a lot along the way. This article focuses\nmostly on the Redis discoveries, since those are general behaviors that some\nof you may encounter in your travels. We also developed some novel tools and\nanalytical methods and uncovered several useful environment-specific facts\nabout our workload, infrastructure, and observability, leading to several\nadditional improvements and proposals not mentioned above.\n\n\nOverall, we made several efficiency improvements and broke the cycle that\nwas driving the pathology. Memory demand now stays well below the saturation\npoint, eliminating the latency spikes that were burning error budgets for\nthe development teams and causing intermittent slowness for users. All\nstakeholders are happy, and we came away with deeper domain knowledge and\nsharper skills!\n\n\n## Key insights summary\n\n\nThe following notes summarize what we learned about Redis eviction behavior\n(current as of version 6.2):\n\n* The same memory budget (`maxmemory`) is shared by key storage and client\nconnection buffers. A spike in demand for client connection buffers counts\ntowards the `maxmemory` limit, in the same way that a spike in key inserts\nor key size would.\n\n* Redis performs evictions in the foreground on its main thread. All time\nspent in `performEvictions` is time not spent handling client requests.\nConsequently, during an eviction burst, Redis has a lower throughput\nceiling.\n\n* If eviction overhead saturates the main thread’s CPU, then response rate\nfalls below request arrival rate. Redis accumulates a request backlog (which\nconsumes memory), and clients experience this as slowness.\n\n* The memory used for pending requests requires more evictions, driving the\neviction burst until enough clients are stalled that arrival rate falls back\nbelow response rate. At that equilibrium point, evictions stop, eviction\noverhead vanishes, Redis rapidly handles its request backlog, and that\nbacklog’s memory gets freed.\n\n* Triggering this cycle requires all of the following:\n  * Redis is configured with a `maxmemory` limit, and its memory demand exceeds that size. This memory saturation causes evictions to begin.\n  * Redis main thread’s CPU utilization is high enough under its normal workload that having to also perform evictions drives it to CPU saturation. This reduces response rate below request rate, causing a growing request backlog and high latency.\n  * Many active clients are connected. The duration of the eviction burst and the size of memory spent on client connection buffers increases proportionally to the number of active clients.\n* Prevent this cycle by avoiding either memory or CPU saturation. In our\ncase, avoiding memory saturation was easier (mainly by reducing cache TTL).\n\n\n## Further reading\n\n\nThe following lists summarize the analytical tools and methods cited in this\narticle. These tools are all highly versatile and any of them can provide a\nmassive level-up when working on performance engineering problems.\n\n\nTools:\n\n* [perf](https://www.brendangregg.com/perf.html) - A Linux performance\nanalysis multitool. In this article, we used `perf` as a sampling profiler,\ncapturing periodic stack traces of the `redis-server` process's main thread\nwhen it is actively running on a CPU.\n\n* [Flamescope](https://github.com/Netflix/flamescope) - A visualization tool\nfor rendering a `perf` profile (and other formats) into an interactive\nsubsecond heat map. This tool invites the user to explore the timeline for\nmicrobursts of activity or inactivity and render flamegraphs of those\ninteresting timespans to explore what code paths were active.\n\n* [BCC](https://github.com/iovisor/bcc) - BCC is a framework for building\nBPF tools, and it ships with many useful tools out of the box. In this\narticle, we used `funclatency` to measure the call durations of a specific\nRedis function and render the results as a histogram.\n\n* [bpftrace](https://github.com/iovisor/bpftrace) - Another BPF framework,\nideal for answering ad-hoc questions about your system's behavior. It uses\nan `awk`-like syntax and is [quick to\nlearn](https://github.com/iovisor/bpftrace#readme). In this article, we\nwrote a [custom `bpftrace`\nscript](https://gitlab.com/gitlab-com/gl-infra/scalability/uploads/cab2cd03231f8dd4819f77b44d768cb9/redis_snoop.getMaxmemoryState.sha_25a228b839a93a1395907a03f83e1eee448b0f14.production_thresholds.bt)\nfor observing the variables used in computing how much memory to free during\neach round of evictions. This script's instrumentation points are specific\nto our particular build of `redis-server`, but the [approach is able to be\ngeneralized](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982498636)\nand illustrates how versatile this tool can be.\n\n\nUsage examples:\n\n*\n[Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_854745083)\n- Walkthrough of using `perf` and `flamescope` to capture, filter, and\nvisualize the stack sampling CPU profiles of the Redis main thread.\n\n*\n[Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_857869826)\n- Walkthrough (including safety check) of using `funclatency` to measure the\ndurations of the frequent calls to function `performEvictions`.\n\n*\n[Example](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7172#note_971197943)\n- Experiment for adjusting Redis settings `lazyfree-lazy-eviction` and\n`maxmemory-eviction-tenacity` and observing the results using `perf`,\n`funclatency`, `funcslower`, and the Redis metrics for eviction count and\nmemory usage.\n\n*\n[Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982498636)\n- This is a working example (script included) of using `bpftrace` to observe\nthe values of a function's variables. In this case we inspected the\n`mem_tofree` calculation at the start of `performEvictions`. Also, these\n[companion\nnotes](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982499538)\ndiscuss some build-specific considerations.\n\n*\n[Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_987049036)\n- Describes the latency injection experiment (the first of the three ideas).\nThis experiment confirmed that memory demand increases at the predicted rate\nwhen we slow response rate to below request arrival rate, in the same way\nevictions do. This result confirmed the request queuing itself is the source\nof the memory pressure that amplifies the eviction burst once it begins.\n","engineering",[23,24,25],"performance","tutorial","DevOps",{"slug":27,"featured":6,"template":28},"how-we-diagnosed-and-resolved-redis-latency-spikes","BlogPost","content:en-us:blog:how-we-diagnosed-and-resolved-redis-latency-spikes.yml","yaml","How We Diagnosed And Resolved Redis Latency Spikes","content","en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes.yml","en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes","yml",{"_path":37,"_dir":38,"_draft":6,"_partial":6,"_locale":7,"data":39,"_id":460,"_type":30,"title":461,"_source":32,"_file":462,"_stem":463,"_extension":35},"/shared/en-us/main-navigation","en-us",{"logo":40,"freeTrial":45,"sales":50,"login":55,"items":60,"search":391,"minimal":422,"duo":441,"pricingDeployment":450},{"config":41},{"href":42,"dataGaName":43,"dataGaLocation":44},"/","gitlab logo","header",{"text":46,"config":47},"Get free trial",{"href":48,"dataGaName":49,"dataGaLocation":44},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":51,"config":52},"Talk to sales",{"href":53,"dataGaName":54,"dataGaLocation":44},"/sales/","sales",{"text":56,"config":57},"Sign in",{"href":58,"dataGaName":59,"dataGaLocation":44},"https://gitlab.com/users/sign_in/","sign in",[61,105,202,207,312,372],{"text":62,"config":63,"cards":65,"footer":88},"Platform",{"dataNavLevelOne":64},"platform",[66,72,80],{"title":62,"description":67,"link":68},"The most comprehensive AI-powered DevSecOps Platform",{"text":69,"config":70},"Explore our Platform",{"href":71,"dataGaName":64,"dataGaLocation":44},"/platform/",{"title":73,"description":74,"link":75},"GitLab Duo (AI)","Build software faster with AI at every stage of development",{"text":76,"config":77},"Meet GitLab Duo",{"href":78,"dataGaName":79,"dataGaLocation":44},"/gitlab-duo/","gitlab duo ai",{"title":81,"description":82,"link":83},"Why GitLab","10 reasons why Enterprises choose GitLab",{"text":84,"config":85},"Learn more",{"href":86,"dataGaName":87,"dataGaLocation":44},"/why-gitlab/","why gitlab",{"title":89,"items":90},"Get started with",[91,96,101],{"text":92,"config":93},"Platform Engineering",{"href":94,"dataGaName":95,"dataGaLocation":44},"/solutions/platform-engineering/","platform engineering",{"text":97,"config":98},"Developer Experience",{"href":99,"dataGaName":100,"dataGaLocation":44},"/developer-experience/","Developer experience",{"text":102,"config":103},"MLOps",{"href":104,"dataGaName":102,"dataGaLocation":44},"/topics/devops/the-role-of-ai-in-devops/",{"text":106,"left":107,"config":108,"link":110,"lists":114,"footer":184},"Product",true,{"dataNavLevelOne":109},"solutions",{"text":111,"config":112},"View all Solutions",{"href":113,"dataGaName":109,"dataGaLocation":44},"/solutions/",[115,140,163],{"title":116,"description":117,"link":118,"items":123},"Automation","CI/CD and automation to accelerate deployment",{"config":119},{"icon":120,"href":121,"dataGaName":122,"dataGaLocation":44},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[124,128,132,136],{"text":125,"config":126},"CI/CD",{"href":127,"dataGaLocation":44,"dataGaName":125},"/solutions/continuous-integration/",{"text":129,"config":130},"AI-Assisted Development",{"href":78,"dataGaLocation":44,"dataGaName":131},"AI assisted development",{"text":133,"config":134},"Source Code Management",{"href":135,"dataGaLocation":44,"dataGaName":133},"/solutions/source-code-management/",{"text":137,"config":138},"Automated Software Delivery",{"href":121,"dataGaLocation":44,"dataGaName":139},"Automated software delivery",{"title":141,"description":142,"link":143,"items":148},"Security","Deliver code faster without compromising security",{"config":144},{"href":145,"dataGaName":146,"dataGaLocation":44,"icon":147},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[149,153,158],{"text":150,"config":151},"Application Security Testing",{"href":145,"dataGaName":152,"dataGaLocation":44},"Application security testing",{"text":154,"config":155},"Software Supply Chain Security",{"href":156,"dataGaLocation":44,"dataGaName":157},"/solutions/supply-chain/","Software supply chain security",{"text":159,"config":160},"Software Compliance",{"href":161,"dataGaName":162,"dataGaLocation":44},"/solutions/software-compliance/","software compliance",{"title":164,"link":165,"items":170},"Measurement",{"config":166},{"icon":167,"href":168,"dataGaName":169,"dataGaLocation":44},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[171,175,179],{"text":172,"config":173},"Visibility & Measurement",{"href":168,"dataGaLocation":44,"dataGaName":174},"Visibility and Measurement",{"text":176,"config":177},"Value Stream Management",{"href":178,"dataGaLocation":44,"dataGaName":176},"/solutions/value-stream-management/",{"text":180,"config":181},"Analytics & Insights",{"href":182,"dataGaLocation":44,"dataGaName":183},"/solutions/analytics-and-insights/","Analytics and insights",{"title":185,"items":186},"GitLab for",[187,192,197],{"text":188,"config":189},"Enterprise",{"href":190,"dataGaLocation":44,"dataGaName":191},"/enterprise/","enterprise",{"text":193,"config":194},"Small Business",{"href":195,"dataGaLocation":44,"dataGaName":196},"/small-business/","small business",{"text":198,"config":199},"Public Sector",{"href":200,"dataGaLocation":44,"dataGaName":201},"/solutions/public-sector/","public sector",{"text":203,"config":204},"Pricing",{"href":205,"dataGaName":206,"dataGaLocation":44,"dataNavLevelOne":206},"/pricing/","pricing",{"text":208,"config":209,"link":211,"lists":215,"feature":299},"Resources",{"dataNavLevelOne":210},"resources",{"text":212,"config":213},"View all resources",{"href":214,"dataGaName":210,"dataGaLocation":44},"/resources/",[216,249,271],{"title":217,"items":218},"Getting started",[219,224,229,234,239,244],{"text":220,"config":221},"Install",{"href":222,"dataGaName":223,"dataGaLocation":44},"/install/","install",{"text":225,"config":226},"Quick start guides",{"href":227,"dataGaName":228,"dataGaLocation":44},"/get-started/","quick setup checklists",{"text":230,"config":231},"Learn",{"href":232,"dataGaLocation":44,"dataGaName":233},"https://university.gitlab.com/","learn",{"text":235,"config":236},"Product documentation",{"href":237,"dataGaName":238,"dataGaLocation":44},"https://docs.gitlab.com/","product documentation",{"text":240,"config":241},"Best practice videos",{"href":242,"dataGaName":243,"dataGaLocation":44},"/getting-started-videos/","best practice videos",{"text":245,"config":246},"Integrations",{"href":247,"dataGaName":248,"dataGaLocation":44},"/integrations/","integrations",{"title":250,"items":251},"Discover",[252,257,261,266],{"text":253,"config":254},"Customer success stories",{"href":255,"dataGaName":256,"dataGaLocation":44},"/customers/","customer success stories",{"text":258,"config":259},"Blog",{"href":260,"dataGaName":5,"dataGaLocation":44},"/blog/",{"text":262,"config":263},"Remote",{"href":264,"dataGaName":265,"dataGaLocation":44},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"text":267,"config":268},"TeamOps",{"href":269,"dataGaName":270,"dataGaLocation":44},"/teamops/","teamops",{"title":272,"items":273},"Connect",[274,279,284,289,294],{"text":275,"config":276},"GitLab Services",{"href":277,"dataGaName":278,"dataGaLocation":44},"/services/","services",{"text":280,"config":281},"Community",{"href":282,"dataGaName":283,"dataGaLocation":44},"/community/","community",{"text":285,"config":286},"Forum",{"href":287,"dataGaName":288,"dataGaLocation":44},"https://forum.gitlab.com/","forum",{"text":290,"config":291},"Events",{"href":292,"dataGaName":293,"dataGaLocation":44},"/events/","events",{"text":295,"config":296},"Partners",{"href":297,"dataGaName":298,"dataGaLocation":44},"/partners/","partners",{"backgroundColor":300,"textColor":301,"text":302,"image":303,"link":307},"#2f2a6b","#fff","Insights for the future of software development",{"altText":304,"config":305},"the source promo card",{"src":306},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758208064/dzl0dbift9xdizyelkk4.svg",{"text":308,"config":309},"Read the latest",{"href":310,"dataGaName":311,"dataGaLocation":44},"/the-source/","the source",{"text":313,"config":314,"lists":316},"Company",{"dataNavLevelOne":315},"company",[317],{"items":318},[319,324,330,332,337,342,347,352,357,362,367],{"text":320,"config":321},"About",{"href":322,"dataGaName":323,"dataGaLocation":44},"/company/","about",{"text":325,"config":326,"footerGa":329},"Jobs",{"href":327,"dataGaName":328,"dataGaLocation":44},"/jobs/","jobs",{"dataGaName":328},{"text":290,"config":331},{"href":292,"dataGaName":293,"dataGaLocation":44},{"text":333,"config":334},"Leadership",{"href":335,"dataGaName":336,"dataGaLocation":44},"/company/team/e-group/","leadership",{"text":338,"config":339},"Team",{"href":340,"dataGaName":341,"dataGaLocation":44},"/company/team/","team",{"text":343,"config":344},"Handbook",{"href":345,"dataGaName":346,"dataGaLocation":44},"https://handbook.gitlab.com/","handbook",{"text":348,"config":349},"Investor relations",{"href":350,"dataGaName":351,"dataGaLocation":44},"https://ir.gitlab.com/","investor relations",{"text":353,"config":354},"Trust Center",{"href":355,"dataGaName":356,"dataGaLocation":44},"/security/","trust center",{"text":358,"config":359},"AI Transparency Center",{"href":360,"dataGaName":361,"dataGaLocation":44},"/ai-transparency-center/","ai transparency center",{"text":363,"config":364},"Newsletter",{"href":365,"dataGaName":366,"dataGaLocation":44},"/company/contact/","newsletter",{"text":368,"config":369},"Press",{"href":370,"dataGaName":371,"dataGaLocation":44},"/press/","press",{"text":373,"config":374,"lists":375},"Contact us",{"dataNavLevelOne":315},[376],{"items":377},[378,381,386],{"text":51,"config":379},{"href":53,"dataGaName":380,"dataGaLocation":44},"talk to sales",{"text":382,"config":383},"Support portal",{"href":384,"dataGaName":385,"dataGaLocation":44},"https://support.gitlab.com","support portal",{"text":387,"config":388},"Customer portal",{"href":389,"dataGaName":390,"dataGaLocation":44},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":392,"login":393,"suggestions":400},"Close",{"text":394,"link":395},"To search repositories and projects, login to",{"text":396,"config":397},"gitlab.com",{"href":58,"dataGaName":398,"dataGaLocation":399},"search login","search",{"text":401,"default":402},"Suggestions",[403,405,409,411,415,419],{"text":73,"config":404},{"href":78,"dataGaName":73,"dataGaLocation":399},{"text":406,"config":407},"Code Suggestions (AI)",{"href":408,"dataGaName":406,"dataGaLocation":399},"/solutions/code-suggestions/",{"text":125,"config":410},{"href":127,"dataGaName":125,"dataGaLocation":399},{"text":412,"config":413},"GitLab on AWS",{"href":414,"dataGaName":412,"dataGaLocation":399},"/partners/technology-partners/aws/",{"text":416,"config":417},"GitLab on Google Cloud",{"href":418,"dataGaName":416,"dataGaLocation":399},"/partners/technology-partners/google-cloud-platform/",{"text":420,"config":421},"Why GitLab?",{"href":86,"dataGaName":420,"dataGaLocation":399},{"freeTrial":423,"mobileIcon":428,"desktopIcon":433,"secondaryButton":436},{"text":424,"config":425},"Start free trial",{"href":426,"dataGaName":49,"dataGaLocation":427},"https://gitlab.com/-/trials/new/","nav",{"altText":429,"config":430},"Gitlab Icon",{"src":431,"dataGaName":432,"dataGaLocation":427},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":429,"config":434},{"src":435,"dataGaName":432,"dataGaLocation":427},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":437,"config":438},"Get Started",{"href":439,"dataGaName":440,"dataGaLocation":427},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/compare/gitlab-vs-github/","get started",{"freeTrial":442,"mobileIcon":446,"desktopIcon":448},{"text":443,"config":444},"Learn more about GitLab Duo",{"href":78,"dataGaName":445,"dataGaLocation":427},"gitlab duo",{"altText":429,"config":447},{"src":431,"dataGaName":432,"dataGaLocation":427},{"altText":429,"config":449},{"src":435,"dataGaName":432,"dataGaLocation":427},{"freeTrial":451,"mobileIcon":456,"desktopIcon":458},{"text":452,"config":453},"Back to pricing",{"href":205,"dataGaName":454,"dataGaLocation":427,"icon":455},"back to pricing","GoBack",{"altText":429,"config":457},{"src":431,"dataGaName":432,"dataGaLocation":427},{"altText":429,"config":459},{"src":435,"dataGaName":432,"dataGaLocation":427},"content:shared:en-us:main-navigation.yml","Main Navigation","shared/en-us/main-navigation.yml","shared/en-us/main-navigation",{"_path":465,"_dir":38,"_draft":6,"_partial":6,"_locale":7,"title":466,"button":467,"image":472,"config":476,"_id":478,"_type":30,"_source":32,"_file":479,"_stem":480,"_extension":35},"/shared/en-us/banner","is now in public beta!",{"text":468,"config":469},"Try the Beta",{"href":470,"dataGaName":471,"dataGaLocation":44},"/gitlab-duo/agent-platform/","duo banner",{"altText":473,"config":474},"GitLab Duo Agent Platform",{"src":475},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1753720689/somrf9zaunk0xlt7ne4x.svg",{"layout":477},"release","content:shared:en-us:banner.yml","shared/en-us/banner.yml","shared/en-us/banner",{"_path":482,"_dir":38,"_draft":6,"_partial":6,"_locale":7,"data":483,"_id":721,"_type":30,"title":722,"_source":32,"_file":723,"_stem":724,"_extension":35},"/shared/en-us/main-footer",{"text":484,"source":485,"edit":491,"contribute":496,"config":501,"items":506,"minimal":713},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":486,"config":487},"View page source",{"href":488,"dataGaName":489,"dataGaLocation":490},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":492,"config":493},"Edit this page",{"href":494,"dataGaName":495,"dataGaLocation":490},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":497,"config":498},"Please contribute",{"href":499,"dataGaName":500,"dataGaLocation":490},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":502,"facebook":503,"youtube":504,"linkedin":505},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[507,554,606,650,679],{"title":203,"links":508,"subMenu":523},[509,513,518],{"text":510,"config":511},"View plans",{"href":205,"dataGaName":512,"dataGaLocation":490},"view plans",{"text":514,"config":515},"Why Premium?",{"href":516,"dataGaName":517,"dataGaLocation":490},"/pricing/premium/","why premium",{"text":519,"config":520},"Why Ultimate?",{"href":521,"dataGaName":522,"dataGaLocation":490},"/pricing/ultimate/","why ultimate",[524],{"title":525,"links":526},"Contact Us",[527,530,532,534,539,544,549],{"text":528,"config":529},"Contact sales",{"href":53,"dataGaName":54,"dataGaLocation":490},{"text":382,"config":531},{"href":384,"dataGaName":385,"dataGaLocation":490},{"text":387,"config":533},{"href":389,"dataGaName":390,"dataGaLocation":490},{"text":535,"config":536},"Status",{"href":537,"dataGaName":538,"dataGaLocation":490},"https://status.gitlab.com/","status",{"text":540,"config":541},"Terms of use",{"href":542,"dataGaName":543,"dataGaLocation":490},"/terms/","terms of use",{"text":545,"config":546},"Privacy statement",{"href":547,"dataGaName":548,"dataGaLocation":490},"/privacy/","privacy statement",{"text":550,"config":551},"Cookie preferences",{"dataGaName":552,"dataGaLocation":490,"id":553,"isOneTrustButton":107},"cookie preferences","ot-sdk-btn",{"title":106,"links":555,"subMenu":563},[556,560],{"text":557,"config":558},"DevSecOps platform",{"href":71,"dataGaName":559,"dataGaLocation":490},"devsecops platform",{"text":129,"config":561},{"href":78,"dataGaName":562,"dataGaLocation":490},"ai-assisted development",[564],{"title":565,"links":566},"Topics",[567,572,577,581,586,591,596,601],{"text":568,"config":569},"CICD",{"href":570,"dataGaName":571,"dataGaLocation":490},"/topics/ci-cd/","cicd",{"text":573,"config":574},"GitOps",{"href":575,"dataGaName":576,"dataGaLocation":490},"/topics/gitops/","gitops",{"text":25,"config":578},{"href":579,"dataGaName":580,"dataGaLocation":490},"/topics/devops/","devops",{"text":582,"config":583},"Version Control",{"href":584,"dataGaName":585,"dataGaLocation":490},"/topics/version-control/","version control",{"text":587,"config":588},"DevSecOps",{"href":589,"dataGaName":590,"dataGaLocation":490},"/topics/devsecops/","devsecops",{"text":592,"config":593},"Cloud Native",{"href":594,"dataGaName":595,"dataGaLocation":490},"/topics/cloud-native/","cloud native",{"text":597,"config":598},"AI for Coding",{"href":599,"dataGaName":600,"dataGaLocation":490},"/topics/devops/ai-for-coding/","ai for coding",{"text":602,"config":603},"Agentic AI",{"href":604,"dataGaName":605,"dataGaLocation":490},"/topics/agentic-ai/","agentic ai",{"title":607,"links":608},"Solutions",[609,611,613,618,622,625,629,632,634,637,640,645],{"text":150,"config":610},{"href":145,"dataGaName":150,"dataGaLocation":490},{"text":139,"config":612},{"href":121,"dataGaName":122,"dataGaLocation":490},{"text":614,"config":615},"Agile development",{"href":616,"dataGaName":617,"dataGaLocation":490},"/solutions/agile-delivery/","agile delivery",{"text":619,"config":620},"SCM",{"href":135,"dataGaName":621,"dataGaLocation":490},"source code management",{"text":568,"config":623},{"href":127,"dataGaName":624,"dataGaLocation":490},"continuous integration & delivery",{"text":626,"config":627},"Value stream management",{"href":178,"dataGaName":628,"dataGaLocation":490},"value stream management",{"text":573,"config":630},{"href":631,"dataGaName":576,"dataGaLocation":490},"/solutions/gitops/",{"text":188,"config":633},{"href":190,"dataGaName":191,"dataGaLocation":490},{"text":635,"config":636},"Small business",{"href":195,"dataGaName":196,"dataGaLocation":490},{"text":638,"config":639},"Public sector",{"href":200,"dataGaName":201,"dataGaLocation":490},{"text":641,"config":642},"Education",{"href":643,"dataGaName":644,"dataGaLocation":490},"/solutions/education/","education",{"text":646,"config":647},"Financial services",{"href":648,"dataGaName":649,"dataGaLocation":490},"/solutions/finance/","financial services",{"title":208,"links":651},[652,654,656,658,661,663,665,667,669,671,673,675,677],{"text":220,"config":653},{"href":222,"dataGaName":223,"dataGaLocation":490},{"text":225,"config":655},{"href":227,"dataGaName":228,"dataGaLocation":490},{"text":230,"config":657},{"href":232,"dataGaName":233,"dataGaLocation":490},{"text":235,"config":659},{"href":237,"dataGaName":660,"dataGaLocation":490},"docs",{"text":258,"config":662},{"href":260,"dataGaName":5,"dataGaLocation":490},{"text":253,"config":664},{"href":255,"dataGaName":256,"dataGaLocation":490},{"text":262,"config":666},{"href":264,"dataGaName":265,"dataGaLocation":490},{"text":275,"config":668},{"href":277,"dataGaName":278,"dataGaLocation":490},{"text":267,"config":670},{"href":269,"dataGaName":270,"dataGaLocation":490},{"text":280,"config":672},{"href":282,"dataGaName":283,"dataGaLocation":490},{"text":285,"config":674},{"href":287,"dataGaName":288,"dataGaLocation":490},{"text":290,"config":676},{"href":292,"dataGaName":293,"dataGaLocation":490},{"text":295,"config":678},{"href":297,"dataGaName":298,"dataGaLocation":490},{"title":313,"links":680},[681,683,685,687,689,691,693,697,702,704,706,708],{"text":320,"config":682},{"href":322,"dataGaName":315,"dataGaLocation":490},{"text":325,"config":684},{"href":327,"dataGaName":328,"dataGaLocation":490},{"text":333,"config":686},{"href":335,"dataGaName":336,"dataGaLocation":490},{"text":338,"config":688},{"href":340,"dataGaName":341,"dataGaLocation":490},{"text":343,"config":690},{"href":345,"dataGaName":346,"dataGaLocation":490},{"text":348,"config":692},{"href":350,"dataGaName":351,"dataGaLocation":490},{"text":694,"config":695},"Sustainability",{"href":696,"dataGaName":694,"dataGaLocation":490},"/sustainability/",{"text":698,"config":699},"Diversity, inclusion and belonging (DIB)",{"href":700,"dataGaName":701,"dataGaLocation":490},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":353,"config":703},{"href":355,"dataGaName":356,"dataGaLocation":490},{"text":363,"config":705},{"href":365,"dataGaName":366,"dataGaLocation":490},{"text":368,"config":707},{"href":370,"dataGaName":371,"dataGaLocation":490},{"text":709,"config":710},"Modern Slavery Transparency Statement",{"href":711,"dataGaName":712,"dataGaLocation":490},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":714},[715,717,719],{"text":540,"config":716},{"href":542,"dataGaName":543,"dataGaLocation":490},{"text":545,"config":718},{"href":547,"dataGaName":548,"dataGaLocation":490},{"text":550,"config":720},{"dataGaName":552,"dataGaLocation":490,"id":553,"isOneTrustButton":107},"content:shared:en-us:main-footer.yml","Main Footer","shared/en-us/main-footer.yml","shared/en-us/main-footer",[726],{"_path":727,"_dir":728,"_draft":6,"_partial":6,"_locale":7,"content":729,"config":733,"_id":735,"_type":30,"title":18,"_source":32,"_file":736,"_stem":737,"_extension":35},"/en-us/blog/authors/matt-smiley","authors",{"name":18,"config":730},{"headshot":731,"ctfId":732},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749682529/Blog/Author%20Headshots/msmiley-headshot.jpg","msmiley",{"template":734},"BlogAuthor","content:en-us:blog:authors:matt-smiley.yml","en-us/blog/authors/matt-smiley.yml","en-us/blog/authors/matt-smiley",{"_path":739,"_dir":38,"_draft":6,"_partial":6,"_locale":7,"header":740,"eyebrow":741,"blurb":742,"button":743,"secondaryButton":747,"_id":749,"_type":30,"title":750,"_source":32,"_file":751,"_stem":752,"_extension":35},"/shared/en-us/next-steps","Start shipping better software faster","50%+ of the Fortune 100 trust GitLab","See what your team can do with the intelligent\n\n\nDevSecOps platform.\n",{"text":46,"config":744},{"href":745,"dataGaName":49,"dataGaLocation":746},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":51,"config":748},{"href":53,"dataGaName":54,"dataGaLocation":746},"content:shared:en-us:next-steps.yml","Next Steps","shared/en-us/next-steps.yml","shared/en-us/next-steps",{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"seo":754,"content":755,"config":758,"_id":29,"_type":30,"title":31,"_source":32,"_file":33,"_stem":34,"_extension":35},{"title":9,"description":10,"ogTitle":9,"ogDescription":10,"noIndex":6,"ogImage":11,"ogUrl":12,"ogSiteName":13,"ogType":14,"canonicalUrls":12,"schema":15},{"title":9,"description":10,"authors":756,"heroImage":11,"date":19,"body":20,"category":21,"tags":757},[18],[23,24,25],{"slug":27,"featured":6,"template":28},1761814430805]