WAR (wins above replacement) is the foundation that I start with. Baseball Reference's WAR (originally created by Sean Smith) is not a perfect stat, and proponents and skeptics alike have offered a multitude of reasons why it is imperfect. These points of contention include: what replacement level should be, how to separate run prevention responsibilities between pitchers and defense, how to handle sequencing, HR/FB% and other luck-heavy factors, the necessity and specifics of league quality adjustments, using correct runs to wins conversions, and more. I understand all of these issues and consider them all. By no means do I only look at WAR, even though that is the data that I show next to each pitcher's name and/or photo in my rankings. Pitcher offensive WAR is included in all data.
Since WAR focuses heavily on value over a replacement player, it does not always paint the most accurate picture of dominance or pure "greatness". Therefore, I use WAA (wins above average) as well, which strips-out the replacement level "points" and focuses more on how well a pitcher pitched and a little less on much he pitched. Adding them together gives a rough snapshot of a player's career value, weighed moderately for "dominance". I know others weigh WAA to give it equal weight to WAR - which is fine with me. I chose to take the simple route. Others subtract negative WAA seasons from totals. I choose not to do this at this point, but can see the reason to do so.
Unfortunately, even WAR + WAA is not even close to all-encompassing. From that jumping-off point, I look at all of the following factors (in no particular order):
1. Peak dominance beyond what WAA captures. This is why guys like Marichal and Koufax are much higher than what WAR + WAA would suggest. How great was the pitcher at his VERY best? How does his top 3 or 5 or 7 seasons rate?
2. League quality
3. Postseason success
4. Park factor issues in BBRef WAR that may be unfairly helping or hurting a particular pitcher
6. A relief pitcher's WPA data
7. Missed time due to wars, labor issues, segregation, death, and abrupt injuries (in very rare cases), being unfairly buried in the minors, etc.
9. Subjective elements such as historical opinion/perception
*I believe Fangraph's FIP-focused WAR is more accurate for predicting future performance and for estimating talent and value for players with short careers to that point. I think BBref and BBG paint a more accurate picture for guys with long careers, such as the guys being evaluated for this list.
So why use WAR as my foundation if WAR is not a perfect stat?
A lot of things have to be considered when ranking pitchers. Some of these are: skill-driven run prevention, longevity/innings pitched "ability", park effects, team defense, strength of opponents played, league quality, bullpen support, luck, and a lot more. WAR considers most of these things and most of them fairly accurately. I could go into detail about how each of these factors impacts each pitcher's rankings separately, but a single number makes it a lot easier and not much less accurate. Whatever is not covered by WAR/WAA can be easily added to one's evaluation subjectively.