PRO
World 3 Β· Creative AIintermediateAges 10+Nano Banana ProFlux 2Midjourney v7

AI Image Generation Mastery 2026

Master the #1-ranked tools β€” Nano Banana Pro, Flux 2, Midjourney v7, Ideogram 3, Recraft V4 and Reve. The most comprehensive AI image course available.

8 hours17 lessons1400 XP total

Course Syllabus

17 lessons
1

2026 AI Art Wars: Nano Banana Pro, Flux 2, MJ v7, Ideogram 3 and Reve Ranked

18 min35 XP

A data-driven comparison of every major AI image model in 2026. Know which tool wins for photorealism, illustration, typography, and speed.

  • Nano Banana Pro leads 2026 benchmarks with an overall score of 9.50/10, excelling at photorealism, prompt adherence, and human detail β€” making it the top choice for portrait photography, editorial imagery, and any scene requiring lifelike skin, eyes, and hair.
  • Flux 2 wins the commercial photography category with a physically-based lighting simulation engine that produces studio-quality product shots, architectural visualizations, and fashion editorial imagery that clients can use directly without post-processing.
  • Midjourney v7 retains its crown for artistic and aesthetic work β€” its distinctive visual sensibility, painterly quality, and cinematic composition make it the preferred tool for concept art, album covers, book illustrations, and any project where artistic appeal matters most.
  • Ideogram 3 is the only major AI image model that reliably renders readable text inside images, achieving 90%+ accuracy β€” a breakthrough that makes it indispensable for YouTube thumbnails, event posters, book covers, and any marketing asset with typography.
  • Reve was engineered with prompt adherence as its primary objective β€” it handles complex multi-element compositions with precise spatial relationships ('a blue chair to the left of a red table with a cat sitting underneath') more reliably than any competitor.
  • Recraft V4 is the only AI image tool that outputs true SVG vector files β€” making it the go-to for logo design, icon sets, and brand identity work where assets must scale perfectly from a business card to a billboard without quality loss.
  • Cost structures vary dramatically: Midjourney offers unlimited generations for $10/month, while Flux and others charge per-image via API β€” making Midjourney far more economical for high-volume creative work, while API models are better for automated pipelines.
  • Choosing the right tool for each task is the most important skill in AI image work β€” using Midjourney when you need Ideogram's text rendering or using Recraft when you need Nano Banana Pro's photorealism wastes time and produces inferior results.
2

The Master Prompt Formula: Subject, Style, Lighting, Mood, Camera, Render

20 min40 XP

The universal 6-part formula that works across every AI image tool. Master this and you'll get professional results on day one.

  • Subject is the foundation of every image prompt β€” describe your subject with precise visual details: age, gender, hair color and style, clothing, expression, and pose. 'A woman' gives mediocre results; 'a 30-year-old woman with curly red hair, wearing a vintage leather jacket, laughing mid-sentence' gives exactly what you envisioned.
  • Style defines the visual language of the image β€” reference a specific art movement (Impressionism, Art Deco, Brutalism), artistic medium (oil painting, charcoal sketch, watercolor), or cultural aesthetic (Studio Ghibli, Wes Anderson, Cyberpunk) to instantly anchor the entire visual direction.
  • Lighting is the single most powerful quality upgrade in any image prompt β€” specifying 'golden hour backlit portrait' transforms a flat daytime image into something cinematic. Lighting terms like Rembrandt, chiaroscuro, volumetric, and bioluminescent each produce distinctively different atmospheres.
  • Mood sets the emotional register of the image β€” words like 'melancholic', 'frenetic', 'serene', 'ominous', or 'euphoric' communicate atmosphere that purely visual descriptions miss, and AI models are surprisingly good at translating emotional adjectives into visual choices.
  • Camera and lens specifications dramatically affect composition and feel β€” '85mm portrait lens with f/1.4 bokeh' produces very different results than '14mm ultra-wide with deep focus', and mentioning the angle (bird's eye, worm's eye, Dutch angle) controls how powerful or vulnerable the subject feels.
  • Render engine hints signal the desired quality level and technical style β€” 'Octane render' suggests photorealistic 3D, 'Unreal Engine' suggests game-quality realism, 'RAW photo' suggests documentary authenticity, and '8K ultra-detailed' signals maximum resolution and sharpness.
  • The 6-part formula works best as natural language, not a comma-separated keyword list β€” 'A photorealistic portrait of a 35-year-old architect standing in her modernist studio, shot on an 85mm lens with warm golden afternoon light streaming through floor-to-ceiling windows, shallow depth of field, calm and focused mood' outperforms a list of keywords.
  • Iterate systematically using the 6-part formula β€” if the result is wrong, identify which component failed (is the subject wrong? the lighting? the style?) and adjust only that one component in the next prompt, learning how each part controls the output.
3

Nano Banana Pro Deep Dive: Why It Scores 9.50 and How to Max It Out

25 min50 XP

Explore the architecture behind Nano Banana Pro's industry-leading scores. Learn the specific prompting techniques that unlock its full potential.

  • Nano Banana Pro's hybrid diffusion-autoregressive architecture gives it an unusual advantage: it processes the full prompt compositionally before generating, producing stronger spatial reasoning and more accurate subject placement than pure diffusion models.
  • Native 4K output means you get professional-resolution images without any upscaling artifacts β€” Nano Banana Pro's 8K lossless upscaling option produces billboard-quality files from a single generation, eliminating the need for third-party upscalers.
  • Nano Banana Pro's strongest capability is photorealistic human rendering β€” skin pores, hair strands, natural eye reflections, and micro-expressions are rendered with a level of detail that sets it apart from competitors on portrait and fashion work.
  • Write prompts for Nano Banana Pro in natural flowing sentences rather than comma-separated keyword lists β€” its natural language understanding is strong enough that descriptive sentences produce more coherent, accurately composed results than tokenized keyword strings.
  • Negative prompts dramatically improve output quality β€” include 'blurry, deformed hands, extra fingers, distorted face, watermark, text overlay, oversaturated, plastic skin' in your negative prompt to eliminate the most common AI image artifacts.
  • Seed values lock the random starting noise for a generation β€” using the same seed with a slightly modified prompt produces a new image that shares the same underlying composition and character placement, enabling iterative refinement.
  • Batch generation produces 4 simultaneous variations from a single prompt β€” always generate a batch rather than a single image and select the strongest result, as variation between seeds is significant and the best of 4 consistently outperforms a single generation.
  • The 'consistency mode' in Nano Banana Pro locks character identity across multiple prompts β€” generate your character once, save the seed and character reference, then use consistency mode to place them in any new setting while maintaining their appearance.
4

Flux 2 Masterclass: Hyperrealism, Lighting Physics and Commercial Workflows

25 min50 XP

Flux 2's lighting physics engine produces studio-quality commercial imagery. Master product photography, architecture, and fashion workflows.

  • Flux 2's physically-based lighting engine simulates how light actually behaves in the real world β€” shadows have soft penumbras, reflective surfaces show accurate environment maps, and translucent materials scatter light with subsurface realism.
  • Flux 2 is the top choice for commercial applications: product photography (food, cosmetics, electronics), architectural visualization, automotive renders, and fashion editorial β€” fields where photorealistic accuracy directly impacts business outcomes.
  • Aspect ratio selection should match your final delivery platform β€” use 1:1 for Instagram posts, 4:5 for Instagram portrait, 9:16 for TikTok and Stories, 16:9 for YouTube thumbnails and presentations, and 3:2 for traditional photography prints.
  • The CFG (Classifier-Free Guidance) scale controls how literally the model interprets your prompt β€” low CFG (3-5) allows creative interpretation and stylistic variation, while high CFG (10-15) forces strict adherence to every word but can produce over-saturated, artifacted results.
  • ControlNet integration gives Flux 2 precise compositional control β€” use OpenPose to extract a body pose from one image and apply it to a new character, or use Canny edge detection to preserve an existing composition while completely changing the style.
  • Flux 2's API enables automated batch commercial workflows β€” integrate it into your production pipeline to generate hundreds of product variation images (different colors, backgrounds, angles) automatically without manual prompting for each.
  • Model fine-tuning on Flux 2 is available through services like Replicate and fal.ai β€” train a custom model on your brand's visual style or a specific product with as few as 20 reference images, producing perfectly on-brand imagery at scale.
  • For the best commercial results, combine Flux 2 with professional post-processing: generate at maximum resolution, run through Topaz Gigapixel AI for additional upscaling, then do final color grading in Adobe Lightroom or Capture One.
5

Midjourney v7: Sref (Style Reference), Cref (Character Reference) and Aesthetics

28 min55 XP

Midjourney's v7 parameters unlock cinematic consistency. Master sref for brand styles and cref for persistent characters.

  • `--sref [URL]` is Midjourney v7's style reference parameter β€” paste the URL of any image and Midjourney will extract and lock the visual style (color palette, brush strokes, aesthetic feeling) while applying it to your new prompt, enabling perfect brand consistency across projects.
  • `--cref [URL]` is the character reference parameter β€” provide a portrait image URL and Midjourney will maintain that person's face, hairstyle, and distinctive features across multiple different scenes, poses, and settings in new generations.
  • `--ar` sets the aspect ratio of the output β€” `--ar 16:9` for widescreen, `--ar 1:1` for square, `--ar 9:16` for vertical mobile, `--ar 3:2` for traditional photo format. Choosing the right ratio before generating avoids awkward cropping.
  • `--stylize` (shorthand `--s`) controls how aggressively Midjourney applies its aesthetic opinions β€” `--s 0` produces a literal, conservative interpretation of your prompt; `--s 1000` gives Midjourney maximum creative license to make the image beautiful by its own standards.
  • `--chaos` (shorthand `--c`) controls the visual diversity between your 4 generated options β€” `--c 0` produces 4 very similar variations, while `--c 100` produces 4 radically different interpretations, useful when exploring ideas rather than refining a specific vision.
  • `--weird` (shorthand `--w`) introduces unexpected, surreal, and unconventional elements β€” values from 250 to 3000, where higher values produce increasingly bizarre outputs. Essential for experimental art, dreamlike imagery, and breaking away from predictable compositions.
  • Remix mode allows you to modify one of the 4 generated grid images with a new prompt β€” click the Remix button on a variation you like, change part of the prompt (lighting, style, season, mood), and get a new variation that preserves the composition you liked.
  • The `/describe` command in Midjourney accepts any image URL and generates 4 prompt suggestions that would produce similar images β€” invaluable for reverse-engineering the prompting style behind images you admire and for building your prompt vocabulary.
6

Reve Image: Prompt Adherence King β€” Complex Multi-Element Scenes

20 min40 XP

Reve follows complex prompts more accurately than any other model. Perfect for scenes with multiple specific objects, relationships, and spatial layout.

  • Reve was purpose-built with prompt adherence as its primary optimization target β€” while other models prioritize aesthetics or photorealism, Reve was trained specifically to follow every detail of a complex prompt as faithfully as possible.
  • Reve handles spatial relationship prompts that other models fail at β€” phrases like 'a blue sphere to the left of a wooden table, with a red cup positioned on top of the table, and a cat sitting underneath on a green mat' are rendered accurately and reliably.
  • Reve supports unusually long, detailed prompts β€” up to 1000 words without losing fidelity to early details β€” which makes it the only model suitable for extremely detailed scene specifications like illustrated children's book spreads or complex infographic backgrounds.
  • Best use cases for Reve: scientific and technical illustrations that require accurate spatial layout, children's book scenes with specific character interactions, infographic backgrounds with multiple labeled elements, and complex product composition shots.
  • Reve's weakness is that its aesthetic style can feel more literal and less artistically polished than Midjourney β€” it follows instructions precisely but may not add the 'creative spark' of artistic interpretation. Use Reve for accuracy, Midjourney for beauty.
  • A powerful two-tool workflow: use Reve to establish perfect composition and spatial layout, export the result, then use it as an img2img reference in Midjourney or Flux 2 to refine the aesthetic quality while preserving the accurate structure.
  • Reve's multi-character scene generation is noticeably more reliable than competitors β€” when you need 4 specific characters interacting in precise ways within a single image, Reve is the most likely to produce the correct scene on the first attempt.
  • Always write Reve prompts in the style of a technical specification rather than creative prose β€” be explicit, sequential, and positional: 'centered composition, character A on the left, character B on the right, both facing each other, background: forest at dusk'.
7

Ideogram 3.0: 90% Text Accuracy β€” Posters, Thumbnails and Typography Mastery

18 min35 XP

The only AI model that renders readable text inside images reliably. Essential for thumbnails, posters, book covers, and marketing assets.

  • Ideogram 3 achieves 90%+ text rendering accuracy inside images β€” a benchmark that represents a 5x improvement over the next best competitor, solving the longstanding problem that made AI-generated marketing assets unusable due to gibberish text.
  • Use Ideogram's quote syntax for precise text placement β€” write 'a motivational poster that says "Your Dreams Are Valid"' and the quoted text will appear legibly within the image, rendered in a style that matches the overall aesthetic.
  • Ideogram 3 supports multiple separate text elements in different positions β€” you can specify a large title at the top, a subtitle in the middle, and a URL or tagline at the bottom, with each rendered correctly and styled consistently.
  • Ideogram's typography integrates naturally with the image β€” text appears to belong in the scene, with matching perspective, lighting, color treatment, and stylistic effects rather than looking like text pasted on top as an afterthought.
  • Best use cases for Ideogram: YouTube thumbnails requiring bold text overlays, social media graphics with quotes or captions, event flyers and announcements, book and album covers, educational infographics, and any marketing asset where text is a core element.
  • Ideogram's Magic Prompt feature automatically enhances your prompt β€” it adds stylistic detail, lighting suggestions, and compositional improvements based on your intent, significantly improving output quality for users who aren't yet expert prompters.
  • For precise control over text appearance, specify font style and weight in your Ideogram prompt: 'bold sans-serif white text with a drop shadow', 'elegant serif gold lettering', or 'handwritten brush script in navy blue' all produce distinctly different typographic treatments.
  • Pair Ideogram with Photoshop or Canva for final production assets β€” generate the AI image with perfectly rendered text in Ideogram, then adjust layout, add brand elements, resize for different platforms, and ensure exact color match to brand guidelines.
8

Recraft V4: Logos, SVG Vectors and Brand Identity Design

20 min40 XP

Recraft V4 is the only AI tool that outputs true SVG vectors. Generate logos, icons, and brand assets that scale perfectly.

  • Recraft V4 is the only mainstream AI image tool that outputs true SVG (Scalable Vector Graphics) files β€” meaning logos and icons it generates can be scaled from a 16px favicon to a 10-meter billboard billboard without any pixelation or quality loss.
  • Recraft understands design briefs written in plain English β€” 'a minimalist tech startup logo, geometric shield shape, sans-serif wordmark, navy blue and white, conveying trust and innovation' produces on-brief results that rival junior designer work.
  • Recraft's style system covers the full range of modern design aesthetics: flat 2D, 3D rendered, glassmorphism, neumorphism, line art, filled icons, outlined icons, isometric, and hand-drawn β€” each producing a consistently styled set of outputs.
  • Batch icon set generation is one of Recraft's killer features β€” describe a set of 20 related icons in one prompt ('a set of 20 e-commerce icons including cart, payment, shipping, returns, and reviews in a consistent flat style') and get a coherent, professional-quality icon set in minutes.
  • Recraft integrates directly with the Figma design workflow β€” export SVG files directly and open them as editable vector layers in Figma, where you can modify colors, adjust paths, combine with other elements, and prepare for developer handoff.
  • Recraft is the starting point for brand identity work: generate logo concepts, icon systems, and brand pattern elements in Recraft, then refine and finalize in Adobe Illustrator or Figma before switching to Midjourney or Flux 2 for marketing photography that uses the established visual language.
  • Color control in Recraft is unusually precise β€” you can specify exact hex color codes in your prompt and the model will use those exact colors in the output SVG, enabling brand-accurate generation without manual recoloring.
  • For custom brand fonts in Recraft logos, describe the typographic style in detail ('geometric sans-serif with rounded corners, similar to Nunito') rather than specifying a font name β€” this produces stylistically appropriate letterforms without copyright concerns.
9

GPT Image 1.5: Conversational Editing and Instruction Chains

18 min35 XP

OpenAI's GPT Image 1.5 lets you edit images through natural conversation. Chain instructions to transform images step by step.

  • GPT Image 1.5 brings conversational image editing to AI β€” after generating an initial image, you continue the conversation with natural instructions: 'now make her smile', 'change the background to a beach at sunset', 'add a coffee mug on the table'.
  • In-painting through conversation eliminates the need for separate mask tools β€” simply describe what region you want changed and GPT Image 1.5 infers the correct area to modify, preserving everything else in the image with impressive consistency.
  • Instruction chains allow 10 or more sequential edits while the core image remains coherent β€” this is a fundamentally different workflow from regenerating from scratch each time, enabling progressive refinement toward a specific creative vision.
  • GPT Image 1.5 excels at product image editing: remove or replace the background, add drop shadows, change the product color, adjust lighting conditions, or place the product in a lifestyle setting β€” all through natural conversation.
  • Cultural and cinematic style references work remarkably well: 'make it look like a Wes Anderson film' produces symmetrical framing and pastel color palettes; 'give it a Wong Kar-wai feel' adds warm amber tones and motion blur; these references are understood without long explanations.
  • Access GPT Image 1.5 through ChatGPT Plus (web and mobile interface) or via the OpenAI API with the `gpt-image-1.5` model identifier for programmatic use in applications, automation pipelines, and batch image editing workflows.
  • The conversational interface makes GPT Image 1.5 ideal for clients and non-technical collaborators β€” instead of learning prompt syntax, they simply describe what they want as if talking to a human designer, making iterative collaboration more natural.
  • Combine GPT Image 1.5 with other tools strategically: use Nano Banana Pro or Flux 2 for the initial high-quality base image, then switch to GPT Image 1.5 for conversational refinement, editing, and variation β€” getting the best of both workflows.
10

Cinematic Lighting: Rembrandt, Golden Hour, Studio, Neon Noir and Bioluminescent

22 min45 XP

Lighting is the single most powerful lever for image quality. Master 15 lighting setups used by professional photographers and cinematographers.

  • Rembrandt lighting β€” named after the Dutch master painter β€” places a single key light at 45 degrees to the side and above the subject, creating the signature triangular highlight on the shadowed cheek. It's the most flattering and dramatic portrait lighting setup in existence.
  • Golden hour lighting (the 20 minutes after sunrise and before sunset) produces warm orange-amber tones with long, soft shadows that add depth and emotion to any subject β€” specifying 'golden hour backlit portrait' is one of the fastest single upgrades you can make to an image prompt.
  • Blue hour (the 20 minutes after sunset) produces a cool, even cobalt-blue ambient light that eliminates harsh shadows β€” combined with lit interior windows and neon signs, it creates the atmospheric, moody twilight look used in luxury brand advertising and cinematic photography.
  • Studio three-point lighting (key light for main illumination + fill light to soften shadows + rim/hair light to separate subject from background) produces the clean, professional look of commercial photography β€” add these three terms to portrait prompts for instantly more polished results.
  • Neon noir lighting combines colored neon signs, rain-slicked streets, volumetric fog, and deep shadows β€” the defining visual language of cyberpunk and neo-noir films. Specify colors explicitly: 'pink and cyan neon reflections on wet asphalt' for maximum atmospheric impact.
  • Bioluminescent lighting describes organic glowing light sources in dark natural environments β€” glowing jellyfish in black ocean water, fireflies in a midnight forest, luminous deep-sea creatures β€” creating a magical, otherworldly quality that's popular in fantasy and sci-fi illustration.
  • Volumetric lighting (also called god rays or crepuscular rays) describes visible beams of light cutting through atmospheric particles like dust, fog, or smoke β€” specify 'volumetric light shafts through dusty warehouse windows' to add dramatic depth and a sense of scale to environments.
  • Chiaroscuro (the dramatic contrast between light and dark) is a technique from Renaissance painting that works powerfully in AI image prompts β€” 'dramatic chiaroscuro lighting, 90% shadow, single spotlight' produces paintings-quality dramatic portraits with intense emotional weight.
11

Negative Prompts, Quality Tokens and Exclusion Techniques

15 min30 XP

Learn the hidden layer of AI image prompting: what to exclude. Negative prompts and quality tokens dramatically improve output quality.

  • Negative prompts are a separate instruction layer that tells the AI what to exclude from the generated image β€” they are just as important as the positive prompt for achieving professional-quality results, especially for eliminating common AI artifacts.
  • The universal negative prompt for photorealistic work: 'blurry, deformed, distorted, ugly, bad anatomy, extra fingers, missing limbs, watermark, text, logo, low quality, JPEG artifacts, overexposed, underexposed' β€” paste this into every realistic portrait or photography prompt.
  • Quality tokens in the positive prompt boost overall detail and resolution β€” terms like 'masterpiece', 'best quality', 'ultra-detailed', 'sharp focus', 'professional photography', and '8K resolution' signal to the model that maximum quality is expected.
  • Style exclusions prevent the AI from defaulting to its preferred aesthetic β€” if you want a photorealistic image, adding 'no anime style, no cartoon, no illustration, no painting, no digital art' prevents drift toward those styles that dominate many models' training data.
  • Weight syntax in Stable Diffusion and compatible models lets you fine-tune the importance of any word β€” `(sharp focus:1.5)` increases emphasis by 50%, `(blurry:0.3)` dramatically reduces but doesn't eliminate that quality, enabling subtle artistic control.
  • Midjourney's negative prompt syntax uses the `--no` parameter at the end of the prompt β€” `/imagine a portrait of a woman --no background, text, logo, blurry` β€” while Stable Diffusion, ComfyUI, and most other tools provide a dedicated separate negative prompt field.
  • Negative prompts are iterative β€” generate an image, identify specific problems (hands with too many fingers, a watermark in the corner, oversaturated colors), then add those specific problems to your negative prompt and regenerate until the issues are resolved.
  • For artistic and stylized work, use positive quality tokens rather than negative prompts β€” 'intricate details, award-winning illustration, trending on ArtStation, perfect composition' is more effective for artistic work than trying to exclude what you don't want.
12

Character Consistency: Cref, IP-Adapter, Face Lock and Multi-Image Coherence

28 min55 XP

The hardest problem in AI image generation. Keep the same character across many images using the most effective tools in 2026.

  • Midjourney's `--cref` parameter is the most accessible character consistency tool β€” provide a URL to a portrait image and Midjourney will maintain that character's face, hairstyle, and distinctive features across new prompts with different settings and poses.
  • IP-Adapter (Image Prompt Adapter) is an open-source technique that injects a reference face image into the Stable Diffusion generation process at the embedding level β€” producing more identity-faithful results than text description alone, especially for less famous faces.
  • InstantID combined with ControlNet achieves the highest character consistency accuracy in the Stable Diffusion ecosystem β€” InstantID locks face identity while ControlNet simultaneously controls pose, giving you a specific person in any body position.
  • Creating a canonical character sheet before starting a project is a critical professional workflow: generate front-view, 3/4-view, side-view, and back-view portraits of your character as foundational reference images that all subsequent generations must match.
  • Seed preservation is the simplest character consistency technique β€” using the same seed value produces the same underlying 'person' even when you change the prompt. Note changes to other prompt elements (lighting, style) can still cause drift, so treat seeds as a starting point.
  • The character identity drift problem β€” where the same character looks subtly different across 10+ images β€” is still an active research problem with no perfect solution. Professional AI illustrators budget for significant curation and regeneration to find frames that match.
  • Runway Gen-4.5 and Kling 3.0 support character-consistent image-to-video animation β€” upload your canonical character portrait and animate them in new scenes while maintaining their established visual identity across both still images and video.
  • For highest-volume consistent character work, consider training a custom LoRA (Low-Rank Adaptation) model on 20-50 images of your specific character β€” this fine-tunes Stable Diffusion to generate that character reliably without reference images in every prompt.
13

ControlNet, Inpainting, Outpainting and Image-to-Image Transformation

30 min60 XP

Advanced image manipulation techniques. Control composition precisely, repair or extend images, and transform existing photos with AI.

  • ControlNet is an extension for Stable Diffusion that conditions image generation on a structural guide β€” edges, depth maps, pose skeletons, normal maps, or segmentation masks β€” giving you precise control over composition that text prompts alone cannot achieve.
  • OpenPose extraction lets you take a body pose from any existing photo and apply it to a completely different character β€” extract the pose from a fashion editorial, then generate your product model in that exact pose with a prompt describing the new character.
  • Canny edge control preserves the exact structural outlines of an existing image while completely transforming its visual style β€” use it to render a pencil sketch as photorealistic, or convert a photographic composition into an oil painting with the exact same shapes.
  • Inpainting lets you erase and regenerate specific regions of an image without affecting the surrounding area β€” select a region, describe what should replace it, and the model fills it with contextually appropriate content that seamlessly blends with the unchanged surrounding pixels.
  • Outpainting expands an image beyond its original borders by generating new content that extends naturally from the edges β€” Adobe Photoshop's Generative Expand uses this technique to change a portrait from square to widescreen while maintaining the subject perfectly.
  • Image-to-image (img2img) transformation uses an existing image as the starting point for generation instead of random noise β€” this is how you can change the style of a photograph, add elements to a scene, or convert between artistic styles while preserving composition.
  • Denoising strength is the key parameter in img2img: 0 means 'don't change anything' and 1 means 'completely ignore the input and generate from scratch'. Values around 0.4-0.6 strike the balance between maintaining the original composition and applying meaningful stylistic changes.
  • Depth map control in ControlNet extracts the 3D depth information from any image and uses it to guide new generations β€” placing objects at the correct depth relationships and perspective without any 3D modeling required.
14

Stable Diffusion 3.5 and ComfyUI: Free Open-Source Power Workflows

30 min60 XP

Run world-class AI image generation for free on your own hardware. Master ComfyUI's node-based workflow system.

  • Stable Diffusion 3.5 Large is a free, open-source AI image model from Stability AI that benchmarks comparably to commercial models like Midjourney and Flux 2 β€” with the critical advantage that you can run it locally with no subscription fees and no per-image costs.
  • Local installation requires a GPU with at least 8GB VRAM β€” an NVIDIA RTX 3070 or 4070 can run SD 3.5 Large at full quality. If you don't have a local GPU, Google Colab's free tier provides enough compute for dozens of test generations.
  • ComfyUI is the professional interface for Stable Diffusion β€” a visual node graph where you connect models, samplers, LoRAs, ControlNet, upscalers, and post-processors into custom pipelines, enabling highly sophisticated workflows no commercial interface can replicate.
  • LoRA (Low-Rank Adaptation) models are small fine-tuned add-ons that modify the base model's style, subject matter, or character β€” stack a photography LoRA, a specific lighting LoRA, and a character LoRA simultaneously to create highly custom outputs.
  • The VAE (Variational Autoencoder) is a component that translates between the model's internal latent space and actual pixel images β€” different VAEs produce different color saturation, sharpness, and detail characteristics, and choosing the right VAE for your use case matters.
  • ComfyUI workflows are exportable and importable as JSON files β€” the community shares thousands of complete production workflows on GitHub, Reddit, and Civitai, meaning you can download professional-grade AI image pipelines and run them immediately without building from scratch.
  • CivitAI is the premier community hub for the open-source AI image ecosystem β€” thousands of free models, LoRAs, embeddings, and ComfyUI workflows contributed by the community, covering every imaginable style, subject, and use case.
  • Self-hosting Stable Diffusion is the most cost-effective path for high-volume image generation β€” after the initial hardware investment, the marginal cost per image is essentially electricity, compared to $0.02-$0.08 per image on commercial APIs.
15

Adobe Firefly: Commercially Safe Generation for Brands and Agencies

18 min35 XP

Adobe Firefly is trained only on licensed content β€” the only major AI image tool safe for commercial use without legal risk.

  • Adobe Firefly is the only major AI image model trained exclusively on licensed Adobe Stock content and public domain imagery β€” this training data provenance means every Firefly output is commercially safe to use without the copyright ambiguity that affects Midjourney, Stable Diffusion, and other models.
  • Every image generated by Adobe Firefly is automatically tagged with C2PA (Content Authenticity Initiative) metadata β€” a cryptographic signature that records the AI tool used, the generation date, and the prompt, enabling content verification and provenance tracking.
  • Generative Fill in Photoshop (powered by Firefly) allows context-aware inpainting directly in your existing Photoshop workflow β€” select any region with the lasso tool, type what you want, and Firefly fills it seamlessly using the surrounding context to match lighting, perspective, and style.
  • Generative Expand outpaints any image to any aspect ratio in seconds β€” select a narrow portrait photo, drag the canvas to widescreen dimensions, and Firefly generates plausible, matching background content on both sides, saving hours of manual background extension work.
  • Text Effects is Firefly's unique feature for creative typography β€” generate text where each letter is filled with complex textures (fire, water, flowers, circuits, stone) that wrap naturally around the letterform, producing editorial-quality typographic artwork in seconds.
  • Structure Reference in Firefly lets you upload a reference image to control composition while completely changing the content β€” keep the structural layout of a bedroom shot but render it as a completely different style or with different furniture and colors.
  • Firefly is included with Adobe Creative Cloud subscriptions at no extra cost, and also available at firefly.adobe.com for non-CC users with a free tier β€” making it the easiest entry point for advertising agencies, brands, and creative professionals who need commercially safe AI imagery.
  • Adobe's commercial safety guarantee is backed by their indemnification policy for enterprise customers β€” if a Firefly output leads to a copyright claim, Adobe provides legal protection, a guarantee no other major AI image company currently offers.
16

Upscaling, Post-Processing and Preparing AI Art for Print and Commercial Use

15 min30 XP

Take AI images from screen resolution to billboard quality. Master upscaling workflows for print, merchandise, and commercial delivery.

  • Topaz Gigapixel AI is the industry-leading upscaling software β€” it uses a dedicated neural network trained specifically to reconstruct fine detail during enlargement, producing 2x to 16x upscales that add genuine sharpness rather than just interpolated blurriness.
  • Real-ESRGAN is the best free open-source upscaler β€” available as a standalone app and built into ComfyUI, it produces results that rival commercial tools for most use cases and is completely free for both personal and commercial use.
  • The minimum resolution for professional print is 300 DPI (dots per inch) at the final output size β€” if an image will be printed at 4x6 inches, you need at least 1200 x 1800 pixels; at A3 size (11.7 x 16.5 inches) you need 3510 x 4950 pixels.
  • An A4 page printed at 300 DPI requires a minimum of 2480 x 3508 pixels β€” most AI image generators produce 1024 x 1024 pixels by default, so upscaling with Topaz or Real-ESRGAN is almost always necessary before sending AI images to a print house.
  • Color profiles determine how colors appear on different devices and media: sRGB is the standard for screens and web (browsers, social media, monitors); Adobe RGB captures a wider gamut for photography; CMYK is required by most print houses for physical print production.
  • Removing lingering AI artifacts after generation is faster with AI tools than manual retouching β€” use Photoshop's Generative Fill on problem areas (extra fingers, merged faces, background glitches), or use Topaz Photo AI's Autopilot to clean noise and sharpen detail simultaneously.
  • File format selection has significant quality and size implications: use PNG for any image with transparency or sharp graphic elements (logos, text), TIFF for archival print files (lossless, large), and JPG (at 90%+ quality) for web images where file size matters.
  • Batch processing is essential for volume work β€” both Topaz Gigapixel and Real-ESRGAN support command-line and batch modes, allowing you to upscale an entire folder of AI-generated images overnight without manual intervention for each file.
17

Project: 5-Tool Portfolio β€” Create 5 Masterpieces Across 5 Different Models

45 min90 XP

Your final project. Create one professional-grade image with each of 5 tools, build a portfolio, and share on social media.

  • Piece 1: Create a photorealistic portrait with Nano Banana Pro β€” write a detailed prompt specifying age, appearance, clothing, lighting, and mood, then select the best of 4 batch-generated variations and upscale to 4K using the built-in upscaling.
  • Piece 2: Create a commercial product shot with Flux 2 β€” choose a real or imaginary product, write a commercial photography prompt with studio lighting specifications, and produce an image you could genuinely use in a marketing campaign.
  • Piece 3: Create an artistic illustration with Midjourney v7 β€” use `--sref` with a reference image from an artist you admire, apply `--stylize 750` for strong aesthetic treatment, and experiment with `--ar 2:3` for a poster-format composition.
  • Piece 4: Create a typographic poster with Ideogram 3 β€” design a poster for an imaginary event, brand, or cause with at least 3 text elements (headline, subheading, date/URL), ensuring all text is readable and design-appropriate.
  • Piece 5: Create an SVG logo with Recraft V4 β€” design a logo for yourself, a fictional company, or a cause you care about, specifying style (minimalist, geometric, organic), color palette, and the brand values it should communicate.
  • Assemble all 5 pieces into a professional portfolio using Canva or Adobe Express β€” create a consistent layout, add brief captions for each piece, and ensure the overall presentation feels cohesive and impressive to a potential client or employer.
  • Document the prompt used for each piece β€” transparency about AI usage is increasingly expected in professional contexts, and sharing your prompt alongside the output demonstrates mastery and builds trust with clients who are curious about your process.
  • Share your portfolio on social media with appropriate AI disclosure hashtags (#AIArt, #GeneratedWithAI) and tag the tools used β€” the AI creative community is active and supportive, and professional-quality student portfolios regularly attract significant attention and opportunities.

Ready to Start Learning?

Create a free account to track your progress, earn XP and badges, and unlock your certificate.