AI and the Challenging World of Non-photography
- SGW
- Oct 12, 2024
- 11 min read
Updated: Jun 28

During a time when I was unavoidably at home a lot, I experimented with AI image-generation on ChatGPT, Sora and Midjourney. After a few months I was fascinated and entertained, but getting the results I wanted wasn't easy. It was a hit and miss process. ChatGPT (Chat Generative Pre-Trained Transformer, if you please) was definitely better at interpreting my prompts but took its time doing it. Figuring out all the nuances of Midjourney settings certainly got in the way of my creativity.
The image below is a good example of just how difficult it can be getting the image you have in mind. I had to give up with Midjourney—it insisted on adding a saddle and repeatedly wanted to make the image an illustration until I added text that mentioned “cinematic” and the like. It could be that I’m just no good at writing prompts. That's why I occasionally get ChatGPT to rewrite them.



After 2 hours, 17 attempts across ChatGPT, Sora and Midjourney, and quite a lot of fiddling around in Photoshop, I ended up with the image I more or less wanted. This image originally had parts of a bit and bridle which I had to remove. The original foot had 6 toes which, among other things, I masked with a little motion blur. And I had to expand the image incrementally because the main subjects were too close to the edge of the frame. Apart from all that, it was a breeze!
In an experiment I gave this task to ChatGPT: “Create an articulate image prompt that describes a gritty, visually eye-catching 1950s black and white photograph that epitomises the human condition and the struggle to make ends meet in a so-called civilized society.” It answered:
“A gritty, black and white photograph in the style of 1950s photojournalism, capturing a raw moment on the streets of a decaying urban neighborhood. A weary man in a threadbare overcoat leans against a brick wall under a flickering streetlamp, his face etched with lines of exhaustion and quiet defiance. Nearby, a barefoot child sits on the curb, clutching a crust of bread, eyes wide with hollow innocence. Behind them, faded shop signs, broken windows, and a billboard preaching prosperity contrast cruelly with their harsh reality. The composition is stark and powerful—harsh shadows, overcast sky, wet pavement reflecting dim light—invoking the struggle of the human condition amid the illusion of civilization. Emphasize cinematic contrast, rich textures, and emotionally resonant expressions.”
Hmm… Judging by the two images below it seems ChatGPT thinks like an American. Imagine that. And incidentally, the man on the left has a torso of unequal proportions. This happens quite regularly in ChatGPT. The bottom image was a much better result from Midjourney. Does Midjourney shine when prompts are articulate and expansive?


Almost all my AI gallery images are created from descriptive text only (a prompt). Occasionally I'll try to base an AI image on an existing very old photograph. In particular, images from the 1800s fascinate me. In the examples below I repeatedly used the genuine photograph on the left without success. Eventually by using descriptive prompts I got 2 results that came close. This was after around 80 attempts.
The centre image is from ChatGPT and the other, which I prefer, is by Sora. (The Midjourney results were disappointing.) It's a pity the software couldn't recognise and reproduce the indigenous basket. You can see here too that ChatGPT has slightly distorted the figures' torsos. It seems this is more likely to occur when reediting.

In an interesting experiment that gave me a headache I copied 18 Midjourney prompts and pasted them one by one into ChatGPT just to see what the differences would be. You can see the comparisons below. Generally I favour ChatGPT images. Sora would have given different results again. It should be kept in mind that with Midjourney there can be a lot of variables because of an abundance of settings. These can significantly impact the results.
The image of the elderly lady and the children suggests that maybe ChatGPT has trouble with toes (4) and Midjourney with fingers (mangled)! In Midjourney the faces of the "rugged archaeologists" are pretty awful—a waxy mess. The cup and cigarette weirdness is the kind of thing I'd probably get too regularly in Midjourney. Notice too in Midjourney the size of the teal phone and other inaccuracies; ChatGPT's desk is peculiar too. See below for all the original Midjourney prompts.
On a PC, after clicking on these 2 images you can drag them onto your desktop to view full-size .webp files, or open them in a new tab and magnify.


Here are users' original prompts for each of the images above (I'm sure Midjourney users wouldn't mind):
A vintage-style photo of a sweet moment on a front porch, featuring an elderly grandmother sitting in a wooden rocking chair and talking with three young children. They're holding glasses of iced tea. The setting is a cozy, old-fashioned home with a wooden porch, pillars, and a front door behind them. The color palette has a soft, warm, slightly faded tone with gentle vignetting and worn edges, evoking the look of an old family photo from the 1950s or 60s. The grandmother has short, white hair and wears a simple, light-colored long-sleeved blouse and long skirt with comfortable sandals. She’s smiling warmly, hands folded in her lap, listening or talking to the children. The children sit side-by-side on the porch steps. From left to right: A young girl with curly hair, wearing a floral dress and sandals, smiles sweetly. A young boy in a red plaid shirt and shorts laughs, looking at one of the girls. A slightly older girl with straight blonde hair and a white t-shirt with shorts, also smiling. The atmosphere is nostalgic, peaceful, and storytelling-rich--capturing the warmth of multigenerational bonding and cherished summer afternoons spent on grandma’s porch.
Badass CS2-style avatar holding a golden Desert Eagle, standing on Mirage map (dusty Middle Eastern city), cinematic shot, hyper-detailed, tactical gear, futuristic sunglasses, dynamic lighting, sandstorm effects, graffiti on walls, Unreal Engine 5 render, 8K, photorealistic, first-person shooter vibe, motion blur, action pose
oblate spheroid futurist car, racing on the surface of the moon
Two rugged archaeologists in retro adventure gear kneeling beside an ancient dinosaur fossil in a late afternoon dig site, grassy field, vintage expedition aesthetic, cinematic film grain, warm sunset light casting sharp shadows, detailed ground texture, wind-blown dust and excavation tools around, captured with Leica M6 + 28mm Summicron lens, 35mm film look, National Geographic magazine cover style, deep tones, high contrast, story-driven composition
1990s professional work photo of woman in her twenties; she's irish with short red hair
Kitchen, neo classic style, beige color with wood, gas stove and display case on the left side, refrigerator and supermarket next to it on the right side, sink under the window next to the laundry on the north side
An illustration of an elegant woman in Renaissance attire, with one hand holding up her head and the other reaching towards her mouth to eat ice cream. The background is ornate and detailed, with elements like a fleur-de-lis or royal symbols. She has intricate hair styled into braids that flow down over her shoulders, wearing elaborate jewelry such as gold earrings and rings on each finger. Her dress features golden embroidery and sleeves adorned with ribbons, in the style of Rococo and oil.
artwork that takes place in a fantasy world, in outer space, showing the mother ship and other et's surrounding and ever changing earth.
A magical Disney castle standing majestically on a hill, surrounded by lush gardens and sparkling fountains, under a vibrant sunset sky with soft clouds, dreamy and fairytale atmosphere, ultra-detailed, vibrant colors, cinematic lighting, high resolution,Rendered with dramatic chiaroscuro lighting, the artwork features a gold-olive green contrast, with radiant golden light shimmering throughout. It evokes a sense of mystery, sanctity, and magic, resembling a Renaissance painting interwoven with elements of fantasy. The composition is highly realistic, capturing both the ethereal and the tangible in perfect balance movie poster dark tone
Logo in a futuristic style, with the text "Opponent". A logo for an epic fantasy video game called 'Opponent', with red letters on a black background, and a solid background. in a medieval font, 3D, Logo in a futuristic style, with the text "Opponent" on a solid white background with golden and red lighting effects. The text is in a medieval font with gold leafing, creating a cinematic light effect against the solid background. The logo design for the game "Belover" features a temple in the background, with a simple and clean design and a golden shimmer. A futuristic-style logo featuring the text "Opponent" in red metallic words against a black background. The design incorporates a temple in ruins theme, fantasy elements, and a golden light with vibrant colors. The text is presented in a fantasy-style font, and the image is in high resolution with a black isolated background.
A blonde woman is sitting on top of a mid-century modern desk talking on a vintage teal telephone, wearing hot pink shoes and an orange 1950s retro vintage pantsuit. She is looking directly into the camera, smiling. The scene features mid-century modern office furniture and wallpaper behind her with teal, aquamarine pink and orange geometric patterns.
A flat lay of a content planner, smartphone with social media apps open, handwritten notes and sketches of post ideas, coffee cup nearby. Bright and modern aesthetic with natural light highlighting creative planning for social media.
air brush cinematic illustration of two black people sitting on a balcony drinking wine in the distance on the street there are shadows of a protesters the streets smoke in the air and chaos. the women do not seem bothered as they enjoy themselves they are observing the protest in silence realism, splash
A hyper-realistic cinematic wide shot of an epic battle between Spider-Man in his black Venom symbiote suit and a monstrous, dragon-like creature. Spider-Man, his athletic body fully sheathed in a glistening black suit that reflects the fiery glow around him, is captured in a dynamic, lunging mid-air pose. His entire body is coiled with explosive energy, muscles taut and powerful, as he swings his right arm forward in a forceful punch aimed directly at the creature’s massive, snarling face. The white spider emblem on his chest is illuminated by the glow of fires raging all around, contrasting against the obsidian-black tendrils of the Venom symbiote rippling along his suit. The monstrous creature is a towering, nightmarish figure with thick, scaly skin that glows molten red from cracks in its armored hide. Its head is adorned with jagged, spiky horns, and its cavernous mouth is open in a furious roar, rows of gleaming, dagger-like teeth glistening in the flickering inferno. Glowing eyes radiate with pure rage, and arcs of flame lick from its nostrils as it braces for the incoming blow. One clawed hand reaches forward, the other gripping a twisted steel beam torn from a nearby building, while its tail whips through the air behind it, adding to the chaotic, battle-hardened presence. The surrounding cityscape is a smoldering ruin, flames consuming shattered buildings and overturned cars, with embers and smoke swirling around the combatants. Light from the fires dances across the wet pavement, reflecting the molten glow of the beast’s scales and the glistening surface of Spider-Man’s symbiote suit. The air is alive with crackling energy, the moment frozen in time just before Spider-Man’s powerful fist connects with the beast’s monstrous head. No motion blur or stylized effects every detail, from the textures of Spider-Man’s web-patterned gloves to the molten fractures in the monster’s hide and the glint of sweat on Spider-Man’s forehead, is captured in maximum photorealistic realism, 100% live-action realism,
Luscious Hair, freckled cheeks, pouting lips, rose pink lips, licking a lollipop, holding a rose, color ink sketch, plain black background, clean lines, clean outlines, digital art, full color
ultra-realistic scene of a person standing on a balcony during golden hour, hand holding a cup of coffee in one hand and a lit cigarette in the other, smoke rising gently, blurred modern city skyline in the background, soft focus on city, sharp focus on hands, cinematic lighting, moody atmosphere, shallow depth of field, photo-real detail, dramatic color grading
Immense, circular, multi-tiered rotunda, walls lined with countless deep niches, each holding a single, large, dark, polished stone sphere. Matte black basalt architecture. A narrow, spiraling ramp ascends the inner perimeter. A soft, warm amber light emanates from a central concave depression in the floor. Extremely low ambient light, profound silence, ancient repose. Style of a primordial archive, hypogeum. Photorealistic, cinematic, deeply atmospheric.
Close-up on some onlookers whispering and casting side glances, their expressions hint at envy or discomfort towards the person’s uplifted spirit. photorealistic image captured on a canon camera
I'm not sure we can use the term AI Photography. AI image-generation can never be photography, but any genre of photography can be mimicked by AI imagery. This is itself a genre of imagery and it can be very engaging.* In my experience, AI images can even be photographically inspirational. But many people are challenged by their falseness.
As a photographer my views on AI have shifted significantly. I’ve come to see my generated content as legitimate works of fiction. Creating images in this way isn't that different to writing an imaginary story—fundamentally it's based on life experiences and an understanding of reality, but isn't true. AI images aren't a truthful record of reality captured through a lens, but that doesn't mean they have no substance or appeal.
Seen in this way, photographers can appreciate and create fictional depictions of the real world. Many will go further still and use their fertile imagination to generate images that visualise all kinds of surreal make-believe. As with fictional novels, there are no limits. But viewers should know when they are looking at computer-generated imagery.
There's a fundamental shift, however, when it comes to art. This is a context where we definitely need to be cautious and more protective. I don’t know about you but I'm bothered by the way AI trawls through the original artistic creations old and new that fill the Internet. Intellectual copyright is being ignored. However, check out this site where "The artistic [r]evolution of AI" is carefully discussed.
When an artistic work is very clearly copied it undermines the creative talent of a dedicated artist. But in the world of business, when there's lots of money to be made, dubious practices will be inevitable. It always pays to manipulate the customer and find creative ways to bend the rules. Big tech knows all about it.
How does text-to-image work? According to the Exxact Corporation: "Denoising diffusion models are trained to pull patterns out of noise, to generate a desirable image. The training process involves showing model examples of images (or other data) with varying levels of noise determined according to a noise scheduling algorithm, intending to predict what parts of the data are noise. If successful, the noise prediction model will be able to gradually build up a realistic-looking image from pure noise, subtracting increments of noise from the image at each time step."

If you sign up for ChatGPT you can use Sora too. Sora gives you some idea what people are using AI image-generation for. A lot of it is the inane content you’d expect. About 5% is really impressive and inspirational. If you spend 10 minutes scrolling through it you’ll see the famous and the infamous—and politicians—in all kinds of contexts and taking selfies, 1000 different versions of Trump, lots and lots and lots and lots of cats (and puppies and lions and monkeys), the Pope doing what he shouldn’t, lots of absurdly pretty young women, many versions of Jesus (with long hair as usual), green aliens… Oh boy.
Anyway, this isn't the place to get into that big debate and wrestle with all the ambiguities and bias. Few visit here anyway! Instead, in my AI galleries I've included some of the images that were created from hundreds of prompts. (Excuse the fun examples. It's hard to resist!) Many of the original PNG images were enhanced using the generative AI tool in Photoshop. For example, the image of the man running across the wet street originally had no falling hat, there was no smoke coming from the old man's pipe, and Mona Lisa's brush was too modern.




*AI images are relatively small and best suited to viewing on a screen, but that's the most popular way to display images these days. And they can be significantly enlarged. I increase the pixel size of all the images you can see here in my AI galleries. Prints are possible too.