NewBie image Exp0.1 is a 3.5B parameter DiT model developed through research on the Lumina architecture. Building on these insights, it adopts Next-DiT as the foundation to design a new NewBie architecture tailored for text-to-image generation. The NewBie image Exp0.1 model is trained within this newly constructed system, representing the first experimental release of the NewBie text-to-image generation framework.
We use Gemma3-4B-it as the primary text encoder, conditioning on its penultimate-layer token hidden states. We also extract pooled text features from Jina CLIP v2, project them, and fuse them into the time/AdaLN conditioning pathway. Together, Gemma3-4B-it and Jina CLIP v2 provide strong prompt understanding and improved instruction adherence.
Use the FLUX.1-dev 16channel VAE to encode images into latents, delivering richer, smoother color rendering and finer texture detail helping safeguard the stunning visual quality of NewBie image Exp0.1.
NewBie image Exp0.1 is pretrain on a large corpus of high-quality anime data, enabling the model to generate remarkably detailed and visually striking anime style images.
We reformatted the dataset text into an XML structured format for our experiments. Empirically, this improved attention binding and attribute/element disentanglement, and also led to faster convergence.
Besides that, It also supports natural language and tags inputs.
In multi character scenes, using XML structured prompt typically leads to more accurate image generation results.
<character_1> <n>$character_1$</n> <gender>1girl</gender> <appearance>chibi, red_eyes, blue_hair, long_hair, hair_between_eyes, head_tilt, tareme, closed_mouth</appearance> <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, blue_skirt, miniskirt, pleated_skirt, blue_hat, mini_hat, thighhighs, grey_thighhighs, black_shoes, mary_janes</clothing> <expression>happy, smile</expression> <action>standing, holding, holding_briefcase</action> <position>center_left</position> </character_1> <character_2> <n>$character_2$</n> <gender>1girl</gender> <appearance>chibi, red_eyes, pink_hair, long_hair, very_long_hair, multi-tied_hair, open_mouth</appearance> <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, red_skirt, miniskirt, pleated_skirt, hair_bow, multiple_hair_bows, white_bow, ribbon_trim, ribbon-trimmed_bow, white_thighhighs, black_shoes, mary_janes, bow_legwear, bare_arms</clothing> <expression>happy, smile</expression> <action>standing, holding, holding_briefcase, waving</action> <position>center_right</position> </character_2> <general_tags> <count>2girls, multiple_girls</count> <style>anime_style, digital_art</style> <background>white_background, simple_background</background> <atmosphere>cheerful</atmosphere> <quality>high_resolution, detailed</quality> <objects>briefcase</objects> <other>alternate_costume</other> </general_tags>
| Model | Hugging Face | ModelScope |
|---|---|---|
| NewBie image Exp0.1 | ||
| Gemma3-4B-it | ||
| Jina CLIP v2 | ||
| FLUX.1-dev VAE |
pip install diffusers transformers accelerate safetensors torch --upgrade
# Recommended: install FlashAttention and Triton according to your operating system.
import torch
from diffusers import NewbiePipeline
def main():
model_id = "NewBie-AI/NewBie-image-Exp0.1"
# Load pipeline
pipe = NewbiePipeline.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
).to("cuda")
# use float16 if your GPU does not support bfloat16
prompt = "1girl"
image = pipe(
prompt,
height=1024,
width=1024,
num_inference_steps=28,
).images[0]
image.save("newbie_sample.png")
print("Saved to newbie_sample.png")
if __name__ == "__main__":
main()

Model Weights: Newbie Non-Commercial Community License (Newbie-NC-1.0).
Code: Apache License 2.0.
This model may produce unexpected or harmful outputs. Users are solely responsible for any risks and potential consequences arising from its use.