Last updated: February 2026

ComfyUI Tutorial

If Midjourney is the iPhone of AI image generation — polished, simple, opinionated — then ComfyUI is the Android. It’s open source, infinitely customizable, runs locally on your hardware, and gives you control over every step of the generation process. The learning curve is steep. The payoff is worth it.

This guide gets you from zero to generating professional-quality images with ComfyUI. No fluff, no theory dumps — just the practical steps.

Why ComfyUI Over Midjourney or DALL-E

Cost: Free. Forever. No subscription, no credits, no per-image fees. Your only cost is electricity and the GPU you already own (or plan to buy).

Privacy: Everything runs locally. Your prompts, your images, your workflows — nothing leaves your machine. For anyone generating sensitive or proprietary content, this matters.

Control: ComfyUI uses a node-based workflow. Every step of the generation pipeline — model loading, prompt encoding, sampling, upscaling, post-processing — is a visible, configurable node. You can modify any step, add custom nodes, and create workflows that do things no cloud service offers.

Quality: With the right models and workflows, ComfyUI output matches or exceeds Midjourney. The difference is that Midjourney gives you good results with minimal effort, while ComfyUI gives you great results with significant effort.

What You Need

Hardware Requirements

ComponentMinimumRecommendedIdeal
GPU VRAM6 GB12 GB24 GB
System RAM16 GB32 GB64 GB
Storage50 GB free200 GB SSD500 GB+ NVMe
GPURTX 3060RTX 4070RTX 4090

Apple Silicon Macs work but are 2-3x slower than equivalent NVIDIA GPUs for image generation. If you’re on a Mac, it’s usable for experimentation but frustrating for production work.

No GPU? You can still use ComfyUI with CPU-only mode, but generation takes 5-10 minutes per image instead of 10-30 seconds. Not practical for iterative work.

Installation (5 Minutes)

Windows (Easiest)

  1. Download the latest release from ComfyUI GitHub
  2. Extract the zip file
  3. Run run_nvidia_gpu.bat (or run_cpu.bat for CPU-only)
  4. Browser opens to http://127.0.0.1:8188

That’s it. No Python environment setup, no dependency hell. The standalone package includes everything.

Linux / Mac

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
python main.py

For Mac: add --force-fp16 flag for Apple Silicon optimization.

Your First Image (10 Minutes)

Step 1: Download a Model

ComfyUI needs a Stable Diffusion model (checkpoint). Download one:

  • SDXL 1.0 (general purpose, high quality): ~6.5 GB
  • Stable Diffusion 1.5 (faster, lower VRAM): ~4 GB
  • Pony Diffusion (stylized/anime): ~6.5 GB

Place the downloaded .safetensors file in ComfyUI/models/checkpoints/.

Best source: CivitAI or Hugging Face.

Step 2: Load the Default Workflow

When ComfyUI opens, you’ll see a node graph. The default workflow has everything you need:

  1. Load Checkpoint — Select your downloaded model
  2. CLIP Text Encode (Positive) — Your prompt (what you want)
  3. CLIP Text Encode (Negative) — Negative prompt (what you don’t want)
  4. KSampler — The generation engine
  5. VAE Decode — Converts latent image to visible image
  6. Save Image — Outputs the result

Step 3: Write Your First Prompt

In the positive prompt node:

professional photograph of a mountain space at golden hour, 
dramatic lighting, sharp focus, high detail, 8k resolution

In the negative prompt node:

blurry, low quality, watermark, text, deformed, ugly, 
oversaturated, cartoon

Step 4: Generate

Click Queue Prompt (or press Ctrl+Enter). Wait 10-30 seconds (depending on your GPU). Your image appears in the Save Image node.

Essential Workflows

Text-to-Image (Basic)

The default workflow. Prompt → Generate → Image. Good for exploration and quick concepts.

Pro tip: The KSampler settings matter more than most tutorials admit:

  • Steps: 20-30 (more steps = more detail, diminishing returns after 30)
  • CFG Scale: 7-8 (how closely to follow the prompt; too high = oversaturated)
  • Sampler: euler_ancestral for creative variety, dpmpp_2m for consistency
  • Scheduler: karras (almost always the best choice)

Image-to-Image (Refinement)

Load an existing image, add noise, then regenerate. Useful for:

  • Refining AI-generated images (fix hands, faces, details)
  • Applying a style to a photograph
  • Iterating on a concept

Set denoise strength between 0.3-0.7. Lower = closer to original. Higher = more creative freedom.

Upscaling (Make It Print-Ready)

ComfyUI can upscale images 2-4x while adding detail:

  1. Add an Upscale Model Loader node (use RealESRGAN_x4plus)
  2. Connect to an Upscale Image node
  3. Optionally run through img2img at low denoise (0.2-0.3) for extra detail

A 1024x1024 SDXL image upscaled 4x becomes 4096x4096 — print-quality at 300 DPI for a 13” print.

ControlNet (Precise Control)

ControlNet is ComfyUI’s superpower. It lets you control the composition using reference images:

  • Canny edge: Maintain the outline/structure of a reference image
  • Depth map: Maintain the spatial layout
  • OpenPose: Match a specific human pose
  • Scribble: Turn rough sketches into detailed images

This is how professionals use ComfyUI — not random generation, but controlled, intentional image creation.

Must-Have Custom Nodes

ComfyUI’s ecosystem of custom nodes is what makes it truly powerful. Install via ComfyUI Manager (the first custom node you should install):

  1. ComfyUI Manager — Browse and install other custom nodes from the UI
  2. WAS Node Suite — Text manipulation, image processing, math operations
  3. ComfyUI Impact Pack — Face detection, segmentation, detail enhancement
  4. Efficiency Nodes — Simplified versions of common workflows
  5. AnimateDiff — Turn still images into animations

Common Mistakes (And Fixes)

“My images look blurry/low quality”

  • Increase steps to 25-30
  • Use a better model (SDXL > SD 1.5 for quality)
  • Add quality keywords to prompt: “high detail, sharp focus, professional”

“Hands look wrong”

  • Use the Adetailer custom node (auto-detects and regenerates hands/faces)
  • Add “deformed hands, extra fingers” to negative prompt
  • Use ControlNet with OpenPose for specific hand positions

“Generation is slow”

  • Enable --fp16 mode (half precision, 2x faster, minimal quality loss)
  • Reduce image size (generate at 1024x1024, upscale after)
  • Close other GPU-intensive applications

“Out of VRAM”

  • Add --lowvram flag when starting ComfyUI
  • Use SD 1.5 instead of SDXL (uses ~4GB vs ~8GB)
  • Reduce batch size to 1
  • Enable tiled VAE decoding

ComfyUI vs Automatic1111 vs Midjourney

FeatureComfyUIAutomatic1111Midjourney
InterfaceNode-basedWeb UIDiscord/Web
Learning curveSteepModerateEasy
FlexibilityUnlimitedHighLimited
SpeedFastModerateFast
CostFree (local)Free (local)$10-60/mo
Workflow savingExcellentPoorNone
Custom pipelinesYesLimitedNo
CommunityGrowing fastMatureMassive

Choose ComfyUI if: You want maximum control, plan to create complex workflows, or need reproducible pipelines.

Choose Automatic1111 if: You want a simpler local UI and don’t need node-based workflows.

Choose Midjourney if: You want great results with minimal effort and don’t mind paying.

Getting Good (The Learning Path)

Week 1: Install, generate basic images, learn prompt engineering. Experiment with different models and samplers.

Week 2: Learn img2img and upscaling. Start saving and reusing workflows. Install ComfyUI Manager and explore custom nodes.

Week 3: Learn ControlNet. This is where ComfyUI goes from “toy” to “tool.” Practice with canny edge and depth maps.

Week 4: Build custom workflows for your specific use case. Combine multiple techniques. Start creating content you’d actually use.

Ongoing: Follow r/comfyui and the ComfyUI Discord for new techniques, models, and custom nodes. The ecosystem evolves weekly.

The Bottom Line

ComfyUI is the most powerful AI image generation tool available. It’s also the most complex. If you’re willing to invest 10-20 hours learning it, you’ll have a tool that can produce anything you can imagine — for free, locally, with complete control.

If that sounds like too much work, use Midjourney. No shame in it. But if you want to understand how AI image generation actually works and push it to its limits, ComfyUI is where the magic happens.

You might also like: ComfyUI tutorial.

You might also like: AI tools for gaming developers.

Related guide: Best AI music generators in 2026.

Related guide: Bytedance Seedance AI video generation review.


Related reading: