Midjourney vs. ChatGPT: A Comprehensive Comparison of AI Image Generation

Comparing the image generation capabilities of Midjourney and ChatGPT. Exploring their strengths, weaknesses, and suitability for different use cases. Covers portrait photography, text handling, prompt adherence, and creative experimentation. Offers insights to help choose the right AI tool for your needs.

April 16, 2025

party-gif

Discover the power of AI-generated images as we pit ChatGPT and Midjourney against each other in a comprehensive comparison. Explore the strengths and limitations of these cutting-edge tools, and learn how to leverage them to elevate your creative projects.

Comparison of ChatGPT and Midjourney for Portrait Photography

When it comes to portrait photography, both ChatGPT and Midjourney have their strengths and weaknesses. Overall, Midjourney appears to produce more authentic and lifelike portraits, with better skin textures and natural imperfections. The depth of field and water droplet effects also feel more realistic in Midjourney's outputs.

However, ChatGPT demonstrates stronger prompt adherence, accurately capturing specific details like lighting, clothing, hair, and even small features like scars and tattoos. This makes ChatGPT better suited for portraits with complex, detailed prompts.

One area where ChatGPT struggles is with hands and anatomical accuracy. Midjourney generally handles hands and finger positioning better, though both models have room for improvement in this regard.

In terms of color balance, both models tend to have a slightly warm, yellow hue, which can be corrected with a simple white balance adjustment. Midjourney's portraits also have a more cinematic, aesthetic quality, while ChatGPT's outputs can appear more saturated and posed.

Overall, the choice between the two models for portrait photography will depend on the specific needs of the project. Midjourney may be better suited for more natural, artistic portraits, while ChatGPT excels at capturing intricate details and adhering to complex prompts.

Prompt Adherence and Coherence of ChatGPT and Midjourney

When it comes to prompt adherence and coherence, the comparison between ChatGPT and Midjourney reveals some key differences:

  • Prompt Adherence: ChatGPT consistently demonstrates a stronger ability to follow the specific details and requirements outlined in the prompts. It is able to accurately replicate elements like lighting, clothing, accessories, and even small details like scars and tattoos. Midjourney, on the other hand, sometimes misses or struggles with certain aspects of the prompt, especially when the prompt becomes more complex.

  • Coherence: ChatGPT excels at maintaining coherence and logical consistency within the generated images. It is able to create images where all the elements work together seamlessly, without any morphing or distortion. Midjourney, while strong in aesthetics, can sometimes produce images where certain parts appear disjointed or incoherent when examined closely.

  • Text Generation: When it comes to generating text-based elements like signs, labels, or descriptions, ChatGPT clearly outperforms Midjourney. Midjourney struggles with anything beyond simple text, often producing garbled or incoherent results, while ChatGPT is able to accurately generate the requested text.

  • Specific Prompts: For prompts that require a high level of detail and precision, ChatGPT demonstrates a stronger ability to adhere to the specifics. This is evident in the examples provided, where ChatGPT is able to accurately recreate complex scenes with multiple elements, while Midjourney sometimes falls short.

  • Adaptability: While Midjourney excels in its default creativity and aesthetic quality, ChatGPT shows greater adaptability in its ability to handle a wider range of prompt types, from portraits to surreal scenes, without significant degradation in performance.

In summary, the key advantage of ChatGPT lies in its superior prompt adherence and coherence, particularly when it comes to complex or specific prompts, as well as its stronger text generation capabilities. Midjourney, on the other hand, maintains a lead in overall aesthetic quality and default creativity, but may struggle more with certain prompt requirements and logical consistency.

Text Generation Capabilities of ChatGPT and Midjourney

ChatGPT and Midjourney have both demonstrated impressive text generation capabilities, though their strengths lie in different areas.

ChatGPT excels at adhering to specific prompts, accurately capturing details and nuances. It can generate coherent and contextually appropriate text, even for complex prompts involving multiple elements. ChatGPT's text output often feels natural and human-like.

In contrast, Midjourney's strength lies in its creative and aesthetic capabilities. While it may not always precisely match the provided prompt, Midjourney often generates text with a unique and imaginative flair. Its output can be more abstract and open to interpretation.

When it comes to simple or straightforward text generation tasks, both models perform well. However, for prompts requiring strict adherence to details and coherence, ChatGPT generally outperforms Midjourney. Midjourney shines more in scenarios where creative expression and aesthetics are prioritized.

Ultimately, the choice between the two models depends on the specific needs of the task at hand. ChatGPT may be better suited for applications requiring precise text generation, while Midjourney could be more suitable for creative or artistic text-based projects.

Complex Prompt Adherence and Detailed Prompts

For this first one, Midjourney was pretty close but again this was the best out of four and the dog was supposed to be sitting on top of the cube. Chachi PT got it perfect on the first try.

For this one with Midjourney, the toaster is a little wonky. It did get the three apples right, but the spoon isn't resting on one and it was supposed to be a single sunflower. Chachi GBT got it almost perfect. The spoon was supposed to be resting on the second apple, but it nailed everything else.

All right, then I ramped it up a lot and asked for a chess board with alternating sapphire and marble tiles. I also described how each of the pieces should look - Pawns are robed travelers holding staffs, Knights are armored wolves with glowing eyes. I described each piece. They both got the tile part, and Chachi GBT got more of the pieces right than Midjourney. It actually got almost perfect on some of the attempts. Midjourney was actually closer than I thought, but not as good as Chachi GBT. That is a crazy difficult prompt, something you never would have even thought to attempt a couple months ago.

Handling Upside Down Faces and Crowds

I also tested the models' ability to handle actions or poses where the face is upside down, like a handstand or mid-cartwheel. This is usually a struggle for image models, often resulting in funny-looking results. However, when asking for a close-up, the results were not as bad.

For the further away shots, the results were quite amusing. As a side note, I tried this in Reeve, and it was actually pretty good at these types of prompts, but that comparison will be for a future video.

Another issue I encountered in Midjourney was with crowds. When there are a lot of faces, Midjourney really struggles, with a lot of morphing and distortion. For example, in a concert scene, there were only a few good faces, while the rest were heavily morphed. The same was true for a busy street scene - it looked good at first glance, but upon closer inspection, most of the faces were not well-rendered.

In contrast, ChatGPT was much better at handling these types of crowd scenes, producing more coherent and realistic results.

Censorship and Likeness Capabilities

Midjourney was found to be more relaxed with intellectual property (IP), especially when it came to generating content related to big names like Disney or Pixar. Understandably, some of that content was censored in ChatGPT. Both models could generate public figures, but they had limits on what they could depict them doing.

Midjourney struggled more with generating likenesses of lesser-known individuals. The Explore page of Sora showcased Midjourney's ability to create convincing celebrity selfies, but it had difficulty with less famous people. In contrast, ChatGPT was more consistent in its ability to generate accurate likenesses, regardless of the subject's level of fame.

When it came to anime-style prompts, ChatGPT performed better when asked for specific movie titles rather than artist names. Midjourney was more capable of replicating a wider range of anime styles, including those of popular directors like Mamoru Hosoda and Makoto Shinkai.

Overall, the models exhibited different strengths and weaknesses in terms of censorship and likeness capabilities. Midjourney's relaxed approach to IP allowed for more creative freedom, but it struggled with lesser-known individuals. ChatGPT, on the other hand, maintained tighter control over content but demonstrated more consistent likeness generation across a range of subjects.

Comparison of Anime and Other Art Styles

Both Midjourney and ChatGPT were able to generate images in the style of Studio Ghibli, a renowned anime studio. However, when it came to other anime styles, ChatGPT struggled more than Midjourney.

When asked for specific anime artists like Masaki Yasa, Makoto Shina Kai, or Mamoru Hoda, ChatGPT was unable to replicate those styles effectively. It worked better for ChatGPT when the prompt was based on movie titles, such as "in the style of Wolf Children," rather than the artist's name.

In contrast, Midjourney was more capable of replicating a wider range of anime styles, though it still had some limitations.

Beyond anime, both models were able to generate images in various other art styles, such as tilt-shift, fauxism, mixed media, and Pixar-style. However, Midjourney seemed to have a slight edge in terms of aesthetics and default creativity, particularly when it came to more abstract or surreal prompts.

One notable advantage of ChatGPT was its ability to maintain consistent character references across multiple images, which Midjourney currently lacks. This feature can be particularly useful for creating character-driven narratives or scenes.

Overall, while both models demonstrated impressive capabilities in generating images across different art styles, Midjourney appeared to have a stronger grasp of anime-specific styles, while ChatGPT excelled in maintaining character consistency.

Surreal and Abstract Image Generation

Overall, both Midjourney and ChatGPT performed well in generating surreal and abstract images. While Midjourney may have an edge in terms of aesthetics and default creativity, the two models had a relatively close performance in this realm.

Midjourney occasionally missed some of the specific details outlined in the prompts, such as the suit made of clouds or the hands being made of clouds. However, the surreal and abstract nature of the images generated by both models was impressive, and it often came down to personal preference in terms of which one was preferred.

One-word prompts or vague topics also showcased the models' ability to generate creative and imaginative surreal/abstract images. In this area, Midjourney seemed to have a slight advantage, as it was able to produce more visually striking and unique outputs.

Ultimately, both Midjourney and ChatGPT demonstrated strong capabilities in the realm of surreal and abstract image generation. The choice between the two may come down to personal preference and the specific needs of the user, as each model has its own strengths and weaknesses in this domain.

FAQ