212x Filetype PDF File size 0.55 MB Source: vision.cornell.edu
TheFashionpediaOntologyandFashionSegmentationDataset ∗1 ∗1 ∗3 1,2 Menglin Jia MengyunShi Mikhail Sirotenko Yin Cui 1 1 1,2 Bharath Hariharan Claire Cardie Serge Belongie 1Cornell University 2Cornell Tech 3Google AI Abstract As a step toward mapping out the visual aspects of the fashion world, we introduce the Fashionpedia ontology and fashion segmentation dataset. The Fashionpedia consists of two parts: (1) an ontology built by fashion experts contain- ing 27 main apparel objects, 19 apparel parts, and 92 fine- (a) (b) (c) grained attributes and their relationships and (2) a dataset consisting of everyday and celebrity event fashion images Ensemble annotated with segmentation masks and their associated fine-grainedattributes, built upon the backbone of the Fash- ionpediaontologystructure. Theaimofourworkistoculti- vate research connections between the computer vision and Shoe fashion communities through the creation of a high quality Glasses Above-the- Shoe hip length dataset and associated open competitions, thereby advanc- Bag Plain Dropped ingthestate-of-the-artinfine-grainedvisualrecognitionfor Washed shoulder Ankle length Jacket Single- Fly (Opening) fashion and apparel. Plain breasted Plain Slim (fit) Regular Regular Pants Symmetrical (fit) (fit) Tops Normal 1. Introduction Symmetrical Waist Distressed Above-the-hip Fashion, in its various forms, influences many aspects of length modernsocieties, having a strong financial and cultural im- pact. Recent breakthroughs in the field of computer vision have given rise to increased interest in the visual analysis Collar Pocket of fashion components. A key component in these recent Sleeve Pocket technological advances is the availability of large amounts Sleeve Pocket Neckline of annotated training data of high-quality. Evidence of this Relationships: Part of can be seen in the engagement of the community in the Textile Finishing Textile Pattern Silhouette Opening Type COCOobject recognition dataset [14] and associated chal- Length Nickname Waistline (d) lenges that have run annually from 2015 to present. One Figure 1. Overview of the Fashionpedia dataset: (a) The original area that remains challenging for computers, however, is image; (b) The image with main garment segmentation masks; (c) fine-grained visual recognition. Theimagewithbothmaingarmentandgarmentpartsegmentation Recently, wehaveobservedanincreasingefforttocurate masks; (d) An exploded view of the annotation diagram: the im- datasets for fine-grained visual recognition, evolved from age is annotated with both segmentation masks and fine-grained Caltech-UCSD Birds dataset [22] to the recent iNaturalist attributes (black boxes) species classification and detection dataset [20]. The goal of this line of work is to advance the state-of-the-art in au- tomaticimageclassificationforlargenumbersofrealworld, fine-grained categories. What is missing for these datasets, ∗equal contribution however, is the capability of providing a structured repre- 1 sentation of an image. becausetheannotationsarecollectedbycrawlingfash- Anunderstanding of the fashion world requires that we ion product images associated with attribute-level de- complement computers’ ability to not only detect objects scriptions directly from large online shopping web- and attributes but also understand the relationships and in- sites. Unlike these datasets, the fine-grained attributes teractions between them. In light of this, we introduce the of our datasets are annotated manually by fashion ex- Fashionpedia ontology and image dataset with the aim of perts. Furthermore, to the best of our knowledge, our training and benchmarking the computer vision models for dataset is the first one annotated with localized at- a more comprehensive understanding of fashion. tributes – fashion experts are asked to annotate the Thecontributions of this work are: fine-grained attributes associated with the segmenta- • A fashion ontology informed by product descriptions tion masks labeled by the crowdworkers. Localized fromtheinternetandbuiltbyfashionexperts. Ouruni- attributes could potentially help computational models fiedontologycapturesthecomplexstructureoffashion detect and understand attributes more accurately. objects and ambiguity in descriptions obtained from • Fine categorization: Previous study on the attribute the web, containing 46 apparel objects (27 main ap- categorization suffers from several issues including: parel objects and 19 apparel parts), and 92 fine-grained (1) repeated attributes belonging to the same category attributes in total. (e.g., zip, zipped and zipper) [15, 8]; (2) only contain- • A dataset with a total of around 50K clothing im- ing basic level categorization (object recognition) and ages in daily-life, celebrity events, and online shop- lack of fine categorization (or “subordinate categoriza- ping annotated by both crowd workers for segmen- tion”) [5, 28, 11, 21, 25, 24, 12, 18, 2, 19, 10, 6, 23]. tation masks and fashion experts for fine-grained at- (3) Lack of fashion taxonomies with the needs of real- tributes. The current version of the dataset has 10K world applications for the fashion industry, possibly imageslabeledwithbothsegmentationmasksandfine- due to the research gap in fashion design and com- grained attributes, and the rest 40K labeled with seg- puter vision. To better facilitate research in the areas mentation masks only. of fashion and computer vision, our proposed ontology • We introduce a novel fine-grained segmentation task is built and verified by fashion experts based on four and the associated competition 1 by joining forces be- sources: (1) world-leading e-commerce fashion web- tween the fashion and computer vision communities. sites (e.g., ZARA, H&M, Gap, Uniqlo, Forever21); The proposed task unifies visual categorization and (2) luxury fashion brands (e.g., Prada, Chanel, Gucci); segmentation of rich apparel attributes, which we be- (3) trend forecasting companies(e.g., WGSN);(4)aca- lieve is an important step toward structural understand- demic resources [4, 1]. ing of fashion in real-world applications. 2. Related Work Table 1 summarizes the comparison among different 3. Dataset Specification and Collection datasets with clothing category and attribute labels. Our dataset distinguishes itself in the following three aspects: • Exhaustive annotation of segmentation masks: Ex- 3.1. Fashionpediaontologyanddatarepresentation isting fashion datasets [5, 28] offer segmentation masks for the main garment (e.g., jacket, coat, dress) The Fashionpedia ontology relies on the notions of ob- and the accessory categories (e.g., bag, shoe). The ject (similar to “item” in Wikidata and “object” in Visual smallergarmentobjectssuchascollarsandpocketsare Genome [13]) and statement. Objects represent common not annotated. However, these small objects could be items in apparels. Statements describe detailed character- valuable for the real world applications such as search- istics of an object and consist of a relationship (similar to ing for a specific collar shape during online-shopping. “property” in Wikidata) and an attribute (similar to “value” Ourdatasetsarenotonlyannotatedwiththesegmenta- in Wikidata). For example, we can add a relationship to tion masks for a total of 27 main garments and acces- specify the silhouette of a garment by associating an at- sory categories, but also 19 garment parts (e.g., collar, tribute for the garment silhouette; or we can assign a ma- sleeve, pocket, zipper, embroidery). terial type relationship to a button object by specifying a • Localizedattributes: Thefine-grainedattributesfrom material attribute. In this section, we break down each com- existing datasets [15, 9, 27] tend to be noisy, mainly ponent of the Fashionpedia ontology (Figure 2) and explain 1Kaggle competition website: https://www.kaggle.com/c/ how a large-scale fashion ontology can be built upon the imaterialist-fashion-2019-FGVC6 backbone of the Fashionpedia ontology structure. 2 Name Apparel Category Annotation Type Fine-Grained Attribute Annotation Type Classification BBox Segmentation Unlocalized Localized Fine Categorization UpsandDowns[7] MG Fashion550k [10] MG,A Fashion-MNIST[23] MG Clothing Parsing [25] MG,A Chic or Social [24] MG,A Hipster [12] MG,A,S Runway2Realway[21] MG,A ModaNet[28] MG,A MG,A Deepfashion2 [5] MG MG UTZappos50K[26] A X Fashion200K [6] MG X Fashion Style-128 Floats [18] S X Fashion144k [17] MG,A X FashionStyle14 [19] S X MainProduct Detection [27] MG X StreetStyle-27K [16] X X UT-latent look [8] MG,S X X FashionAI [3] MG,GP,A X X Apparel classification-Style [2] MG X X DARN[9] MG X X WTBI[11] MG,A X X Deepfashion [15] S MG X X Fashionpedia MG,GP,A MG,GP,A X X Table 1. Comparison of Fashion Datasets (MG = Main Garment, GP = Garment Part, A = Accessory, S = Style). shoe buckle napoleon (lapel) types such as jacket, dress, pants are considered as main garments. These garments also consist of several garment belt lapel parts such as collars, sleeves, pockets, buttons, and embroi- khakitrench (coat) epaulette double breasted deries. Main garments are divided into three main cate- gories: outerwear, intimate and accessories. Garment parts set-in sleeve coat regular (collar) elbow-length shirt, blouse straight collar also have different types: garment main parts (e.g., collars, dropped-shoulder sleeve micro (length) sleeve stripe single breasted sleeves), bra parts, closures (e.g., button, zipper) and deco- lining skirt trucker (jacket) wrist-length regular (fit) rations (e.g., embroidery, ruffle). In the current version of knee (length) jacket patch (pocket) hood hip (length) slash (pocket) tank (top) short (length) plain (pattern) three quarter (length) pocket curved (pocket) Fashionpedia, each image consists of an average of 1 per- halter (top) symmetrical classic (t-shirt) top, t-shirt, sweatshirt above-the-knee (length) flap (pocket) abstract son, 3 main garments, 3 accessories,and 12 garment parts, fleecy denim hoodie normal waist distressed velvet, velveteen, velour loose (fit) printed dress low waist each delineated by a tight segmentation mask (Figure 1 (b- sleeveless tight (fit) jersey fit and flare gown floral shirt (dress) floor (length) pants peg c)). Furthermore, each object is canonicalized to a synset a-linesmocking tulle circle satin halter (dress) fly (opening) plastic empire waistline high low maxi (length) asymmetrical flower sweatpants IDinourFashionpedia ontology (Figure 2). sheath (dress) gauze paisley culottes gathering chiffon mini (length) wide leg straight across (neck) jeans scoop (neck) zipper neckline round (neck) sweetheart (neckline) u-neck crew (neck) turtle (neck) plunging (neckline) high (neck) 3.1.2 Fine-grained attributes Figure 2. The visualization of the Fashionpedia ontology (based Each main garment and garment part were associated on20imagesamples). with apparel attributes (Figure 1 (d)). For example, “but- ton” is the part of the main garment “jacket”; “Jacket” can 3.1.1 Main garments, and garment parts, accessories be linked with the silhouette attribute “symmetrical”; Gar- andtheir segmentation masks ment part “button” could contain attribute “metal” with re- lationship of material. Each image in Fashionpedia has an In the Fashionpedia dataset, all images were annotated average of 16 attributes. As with main garments and gar- withmaingarmentsandeachmaingarmentwerealsoanno- ment parts, we canonicalize all attributes to our Fashionpe- tated with its garment parts. For example, general garment dia ontology. 3 3.1.3 Relationships edge, Fashionpedia is the first dataset that combines part- There are three main types of relationships: 1) outfits to level segmentation with fine-grained attributes. The ex- maingarments,maingarmentstogarmentparts: meronymy pected outcome of this project is to advance the state-of- (part-of) relationship (Figure 1 (d)); 2) main garments or the-art in domain-specific fine-grained visual recognition. garment parts to attributes: these relationships types can We expect our Fashionpedia image dataset and its associ- be garment silhouette (e.g., peplum), collar nickname (e.g., ated ontology will have applicability to many applications peter pan collars), textile type (e.g., lace), textile finishing includingbetterproductrecommendationforusersinonline (e.g., distressed), or textile-fabric patterns (e.g., paisley), shopping, enhancedvisualsearchresults, andresolvingam- etc.; 3) within garments, garment parts or attributes: there biguousfashion-related words for textual query. Finally, we are a maximumoffourlevelsofHyponymy(is-an-instance- expect that our work will act as a catalyst for increased at- of) relationships. For example, weft knit is an instance of tention to domain-specific ontology for fashion by joining knit fabric, and fleece is an instance of weft knit. forces between the fashion, computer vision, and natural language processing communities. 3.1.4 Apparel graphs 5. Acknowledgements Integrating the main garments, garment parts, attributes We thank Kavita Bala, Carla Gomes, Dustin Hwang, and relationships, we create an apparel graph representa- Rohun Tripathi, Omid Poursaeed, Hector Liu, and tion for each outfit in an image. Each apparel graph is Nayanathara Palanivel for their helpful feedback and dis- a structured representation of an outfit ensemble, contain- cussion in the development of Fashionpedia dataset. We ing certain types of garments. Nodes in the graph repre- also thank Zeqi Gu, Fisher Yu, Wenqi Xian, Chao Suo, Jun- sent main garments, garment parts, and attributes. Main wenBai, Paul Upchurch, Anmol Kabra, and Brendan Rap- garments and garment parts are linked to their respective pazzofortheirhelpdevelopingthefine-grainedattributean- attributes through different types of relationship. The re- notation tool. lationships connecting garment objects and attributes point from the main garments to the attributes and from the gar- References ment parts to their corresponding attributes. (Figure 1 (d)) illustrates one example of the apparel graph for jacket. [1] Bloomsbury.com. Fashion photography archive. Retrieved May 9, 2019 from https://www.bloomsbury. 3.1.5 Fashionpedia ontology com/dr/digital-resources/products/ fashion-photography-archive/. 2 While apparel graphs are localized representations of [2] L. Bossard, M. Dantone, C. Leistner, C. Wengert, T. Quack, certain outfit ensembles in fashion images, we also create and L. Van Gool. Apparel classification with style. In Com- a single Fashionpedia ontology (Figure 2). The Fashionpe- puter Vision – ACCV 2012, pages 321–335, Berlin, Heidel- dia ontology is the union of all apparel graphs and contains berg, 2013. Springer Berlin Heidelberg. 2, 3 entire main garments, garment parts, attributes, and rela- [3] FashionAI. Retrieved May 9, 2019 from http:// tionships. By doing so, we are able to combine multiple fashionai.alibaba.com/. 3 levels of information in a more coherent way. [4] Fashionary.org. Fashionpedia - the visual dictionary of 3.2. Images Collection fashion design. Retrieved May 9, 2019 from https:// Atotal of 48827 images were harvested from Flickr and fashionary.org/products/fashionpedia. 2 thefreelicensephotowebsites(Unsplash,BurstbyShopify, [5] Y. Ge, R. Zhang, L. Wu, X. Wang, X. Tang, and P. Luo. Freestocks, Kaboompics, and Pexels). Two fashion experts DeepFashion2: A Versatile Benchmark for Detection, Pose were asked to verify the quality of the collected images Estimation, Segmentation and Re-Identification of Cloth- manually. The annotation process consist of two phases, ing Images. arXiv:1901.07973 [cs], Jan. 2019. arXiv: firstly, segmentation masks with apparel objects were anno- 1901.07973. 2, 3 tated by crowd workers. Secondly, 15 fashion experts were [6] X. Han, Z. Wu, P. X. Huang, X. Zhang, M. Zhu, Y. Li, recruited to annotate the fine grained attributes for the seg- Y. Zhao, and L. S. Davis. Automatic spatially-aware fash- mentation masks labeled at the first stage. ion concept discovery. In ICCV, 2017. 2, 3 [7] R. He and J. McAuley. Ups and Downs: Modeling the Vi- 4. Conclusion sual Evolution of Fashion Trends with One-Class Collabora- tive Filtering. Proceedings of the 25th International Confer- In this work, we propose the Fashionpedia ontology and ence on World Wide Web - WWW ’16, pages 507–517, 2016. fashion segmentation dataset. To the best of our knowl- arXiv: 1602.01585. 3 4
no reviews yet
Please Login to review.