The data for this project consist of two files:
- Udacity_AZDIAS_Subset.csv: demographics data for the general population of Germany; 891211 persons (rows) x 85 features (columns)
- Udacity_CUSTOMERS_Subset.csv: demographics data for customers of a mail-order company; 191652 persons (rows) x 85 features (columns)
The columns in the general demographics file and customers data file are the same. This file documents the features that appear in the data files, sorted in order of appearance. Sections of this file is based on the level of measurement of each feature. The file "AZDIAS_Feature_Summary.csv" contains a summary of feature attributes, including information level, data type, and codes for missing or unknown values.
- Person-level features
- Household-level features
- Building-level features
- RR4 micro-cell features
- RR3 micro-cell features
- Postcode-level features
- RR1 neighborhood features
- PLZ8 macro-cell features
- Community-level features
Best-ager typology
- -1: unknown
- 0: no classification possible
- 1: passive elderly
- 2: cultural elderly
- 3: experience-driven elderly
Estimated age based on given name analysis
- -1: unknown (missing)
- 0: unknown (cannot be determined)
- 1: < 30 years old
- 2: 30 - 45 years old
- 3: 46 - 60 years old
- 4: > 60 years old
- 9: uniformly distributed
Gender
- -1: unknown (missing)
- 0: unknown (cannot be determined)
- 1: male
- 2: female
Customer-Journey-Typology: preferred information and buying channels for consumer
- 0: unknown
- 1: Advertising- and Consumptionminimalist
- 2: Advertising- and Consumptiontraditionalist
- 3: advertisinginterested Store-shopper
- 4: advertisinginterested Online-shopper
- 5: Advertising- and Cross-Channel-Enthusiast
- 6: Advertising-Enthusiast with restricted Cross-Channel-Behaviour
Financial typology, for each dimension:
- -1: unknown
- 1: very high
- 2: high
- 3: average
- 4: low
- 5: very low
Dimension translations:
- MINIMALIST: low financial interest
- SPARER: money-saver
- VORSORGER: be prepared
- ANLEGER: investor
- UNAUFFAELLIGER: inconspicuous
- HAUSBAUER: home ownership
Most descriptive financial type for individual
- -1: unknown
- 1: low financial interest (MINIMALIST)
- 2: money-saver (SPARER)
- 3: home ownership (HAUSBAUER)
- 4: be prepared (VORSORGER)
- 5: investor (ANLEGER)
- 6: inconspicuous (UNAUFFAELLIGER)
Year of birth
- missing data encoded as 0
Vacation habits
- 1: Event travelers
- 2: Family-oriented vacationists
- 3: Winter sportspeople
- 4: Culture lovers
- 5: Nature fans
- 6: Hiker
- 7: Golden ager
- 8: Homeland-connected vacationists
- 9: Package tour travelers
- 10: Connoisseurs
- 11: Active families
- 12: Without vacation
Membership in environmental sustainability as part of youth
- 0: not a member of green avantgarde
- 1: member of green avantgarde
Health typology
- -1: unknown
- 0: classification not possible
- 1: critical reserved
- 2: sanitary affine
- 3: jaunty hedonists
Life stage, fine scale
- 1: single low-income earners of younger age
- 2: single low-income earners of middle age
- 3: single average earners of younger age
- 4: single average earners of middle age
- 5: single low-income earners of advanced age
- 6: single low-income earners at retirement age
- 7: single average earners of advanced age
- 8: single average earners at retirement age
- 9: single independent persons
- 10: wealthy single homeowners
- 11: single homeowners of advanced age
- 12: single homeowners at retirement age
- 13: single top earners of higher age
- 14: low-income and average earner couples of younger age
- 15: low-income earner couples of higher age
- 16: average earner couples of higher age
- 17: independent couples
- 18: wealthy homeowner couples of younger age
- 19: homeowner couples of higher age
- 20: top earner couples of higher age
- 21: single parent low-income earners
- 22: single parent average earners
- 23: single parent high-income earners
- 24: low-income earner families
- 25: average earner families
- 26: independent families
- 27: homeowner families
- 28: top earner families
- 29: low-income earners of younger age from multiperson households
- 30: average earners of younger age from multiperson households
- 31: low-income earners of higher age from multiperson households
- 32: average earners of higher age from multiperson households
- 33: independent persons of younger age from multiperson households
- 34: homeowners of younger age from multiperson households
- 35: top earners of younger age from multiperson households
- 36: independent persons of higher age from multiperson households
- 37: homeowners of advanced age from multiperson households
- 38: homeowners at retirement age from multiperson households
- 39: top earners of middle age from multiperson households
- 40: top earners at retirement age from multiperson households
Life stage, rough scale
- 1: single low-income and average earners of younger age
- 2: single low-income and average earners of higher age
- 3: single high-income earners
- 4: single low-income and average-earner couples
- 5: single high-income earner couples
- 6: single parents
- 7: single low-income and average earner families
- 8: high-income earner families
- 9: average earners of younger age from multiperson households
- 10: low-income and average earners of higher age from multiperson households
- 11: high-income earners of younger age from multiperson households
- 12: high-income earners of higher age from multiperson households
Family type, fine scale
- 0: unknown
- 1: single
- 2: couple
- 3: young single parent
- 4: single parent with teenager
- 5: single parent with child of full age
- 6: young family
- 7: family with teenager
- 8: family with child of full age
- 9: shared flat
- 10: two-generational household
- 11: multi-generational household
Family type, rough scale
- 0: unknown
- 1: single (maps to 1 in fine scale)
- 2: couple (maps to 2 in fine scale)
- 3: single parent (maps to 3-5 in fine scale)
- 4: family (maps to 6-8 in fine scale)
- 5: multiperson household (maps to 9-11 in fine scale)
Social status, fine scale
- 1: typical low-income earners
- 2: orientation-seeking low-income earners
- 3: aspiring low-income earners
- 4: villagers
- 5: minimalistic high-income earners
- 6: independent workers
- 7: title holder-households
- 8: new houseowners
- 9: houseowners
- 10: top earners
Social status, rough scale
- 1: low-income earners (maps to 1-2 in fine scale)
- 2: average earners (maps to 3-5 in fine scale)
- 3: independents (maps to 6-7 in fine scale)
- 4: houseowners (maps to 8-9 in fine scale)
- 5: top earners (maps to 10 in fine scale)
Nationality based on given name analysis
- -1: unknown
- 0: unknown
- 1: German-sounding
- 2: foreign-sounding
- 3: assimilated names
Dominating movement of person's youth (avantgarde vs. mainstream; east vs. west)
- -1: unknown
- 0: unknown
- 1: 40s - war years (Mainstream, E+W)
- 2: 40s - reconstruction years (Avantgarde, E+W)
- 3: 50s - economic miracle (Mainstream, E+W)
- 4: 50s - milk bar / Individualisation (Avantgarde, E+W)
- 5: 60s - economic miracle (Mainstream, E+W)
- 6: 60s - generation 68 / student protestors (Avantgarde, W)
- 7: 60s - opponents to the building of the Wall (Avantgarde, E)
- 8: 70s - family orientation (Mainstream, E+W)
- 9: 70s - peace movement (Avantgarde, E+W)
- 10: 80s - Generation Golf (Mainstream, W)
- 11: 80s - ecological awareness (Avantgarde, W)
- 12: 80s - FDJ / communist party youth organisation (Mainstream, E)
- 13: 80s - Swords into ploughshares (Avantgarde, E)
- 14: 90s - digital media kids (Mainstream, E+W)
- 15: 90s - ecological awareness (Avantgarde, E+W)
Return type
- 0: unknown
- 1: influenceable Crazy-Shopper
- 2: demanding Heavy-Returner
- 3: incentive-receptive Normal-Returner
- 4: conservative Low-Returner
- 5: determined Minimal-Returner
Personality typology, for each dimension:
- -1: unknown
- 1: highest affinity
- 2: very high affinity
- 3: high affinity
- 4: average affinity
- 5: low affinity
- 6: very low affinity
- 7: lowest affinity
- 9: unknown
Dimension translations:
- SOZ: socially-minded
- FAM: family-minded
- REL: religious
- MAT: materialistic
- VERT: dreamful
- LUST: sensual-minded
- ERL: event-oriented
- KULT: cultural-minded
- RAT: rational
- KRIT: critical-minded
- DOM: dominant-minded
- KAEM: combative attitude
- PFLICHT: dutiful
- TRADV: tradional-minded
Shopper typology
- -1: unknown
- 0: external supplied hedonists
- 1: Shopping-stressed
- 2: family-shopper
- 3: demanding shopper
Small office / home office flag
- -1: unknown
- 0: no small office/home office
- 1: small office/home office
Academic title flag
- -1: unknown
- 0: unknown
- 1: Dr.
- 2: Dr. Dr.
- 3: Prof.
- 4: Prof. Dr.
- 5: other
Insurance typology
- -1: unknown
- 1: social-safety driven
- 2: individualistic-accepting risks
Energy consumption typology
- -1: unknown
- 1: green
- 2: smart
- 3: fair supplied
- 4: price driven
- 5: seeking orientation
- 6: indifferent
- 9: unknown
Birthdate of head of household
- 0: unknown / no main age detectable
- 1: 1895-01-01 to 1899-12-31
- 2: 1900-01-01 to 1904-12-31
- 3: 1905-01-01 to 1909-12-31
- 4: 1910-01-01 to 1914-12-31
- 5: 1915-01-01 to 1919-12-31
- 6: 1920-01-01 to 1924-12-31
- 7: 1925-01-01 to 1929-12-31
- 8: 1930-01-01 to 1934-12-31
- 9: 1935-01-01 to 1939-12-31
- 10: 1940-01-01 to 1944-12-31
- 11: 1945-01-01 to 1949-12-31
- 12: 1950-01-01 to 1954-12-31
- 13: 1955-01-01 to 1959-12-31
- 14: 1960-01-01 to 1964-12-31
- 15: 1965-01-01 to 1969-12-31
- 16: 1970-01-01 to 1974-12-31
- 17: 1975-01-01 to 1979-12-31
- 18: 1980-01-01 to 1984-12-31
- 19: 1985-01-01 to 1989-12-31
- 20: 1990-01-01 to 1994-12-31
- 21: 1995-01-01 to 1999-12-31
Number of adults in household
Number of professional academic title holders in household
Estimated household net income
- -1: unknown
- 0: unknown
- 1: highest income
- 2: very high income
- 3: high income
- 4: average income
- 5: lower income
- 6: very low income
Consumer pattern over past 12 months
- -1: unknown
- 1: regular customer
- 2: active customer
- 3: new costumer
- 4: stray customer
- 5: inactive customer
- 6: passive customer
Likelihood of children in household
- -1: unknown
- 0: unknown
- 1: most likely
- 2: very likely
- 3: likely
- 4: average
- 5: unlikely
- 6: very unlikely
Length of residence
- -1: unknown
- 0: unknown
- 1: length of residence less than 1 year
- 2: length of residence 1-2 years
- 3: length of residence 2-3 years
- 4: length of residence 3-4 years
- 5: length of residence 4-5 years
- 6: length of residence 5-6 years
- 7: length of residence 6-7 years
- 8: length of residence 7-10 years
- 9: length of residence more than 10 years
Number of households in the building
- missing values encoded by 0
Number of professional academic title holders in building
Type of building (residential vs. commercial)
- -1: unknown
- 0: unknown
- 1: residential building
- 2: residential building buildings without actually known household
- 3: mixed (=residential and company) building
- 4: mixed building without actually known household or company
- 5: company building w/o known company
- 6: mixed building without actually known household
- 7: company building
- 8: mixed building without actually known company
Distance from building to point of sale (PoS)
- 1: building is located in a 125 x 125m grid cell (RA1), which is a consumption cell
- 2: building is located in a 250 x 250m grid cell that includes at least one RA1-consumption cell
- 3: building is located in a 500 x 500m grid cell that includes at least one RA1-consumption cell
- 4: building is located in a 1 x 1km grid cell that includes at least one RA1-consumption cell
- 5: building is located in a 2 x 2km grid cell that includes at least one RA1-consumption cell
- 6: building is located in a 10 x 10km grid cell that includes at least one RA1-consumption cell
- 7: building is not located in a 10 x 10km range of a consumption cell
First year building was mentioned in the database
- missing values encoded by 0
Building location via former East / West Germany (GDR / FRG)
- -1: unknown
- O: East (GDR)
- W: West (FRG)
Neighborhood quality (or rural flag)
- -1: unknown
- 0: no score calculated
- 1: very good neighborhood
- 2: good neighborhood
- 3: average neighborhood
- 4: poor neighborhood
- 5: very poor neighborhood
- 7: rural neighborhood
- 8: new building in rural neighborhood
German CAMEO: Wealth / Life Stage Typology, rough scale
- -1: unknown
- 1: upper class
- 2: upper middleclass
- 3: established middleclass
- 4: consumption-oriented middleclass
- 5: active middleclass
- 6: low-consumption middleclass
- 7: lower middleclass
- 8: working class
- 9: urban working class
- X: unknown
German CAMEO: Wealth / Life Stage Typology, detailed scale
- 1A: Work-Life-Balance
- 1B: Wealthy Best Ager
- 1C: Successful Songwriter
- 1D: Old Nobility
- 1E: City Nobility
- 2A: Cottage Chic
- 2B: Noble Jogger
- 2C: Established gourmet
- 2D: Fine Management
- 3A: Career & Family
- 3B: Powershopping Families
- 3C: Rural Neighborhood
- 3D: Secure Retirement
- 4A: Family Starter
- 4B: Family Life
- 4C: String Trimmer
- 4D: Empty Nest
- 4E: Golden Ager
- 5A: Younger Employees
- 5B: Suddenly Family
- 5C: Family First
- 5D: Stock Market Junkies
- 5E: Coffee Rider
- 5F: Active Retirement
- 6A: Jobstarter
- 6B: Petty Bourgeois
- 6C: Long-established
- 6D: Sportgardener
- 6E: Urban Parents
- 6F: Frugal Aging
- 7A: Journeymen
- 7B: Mantaplatte
- 7C: Factory Worker
- 7D: Rear Window
- 7E: Interested Retirees
- 8A: Multi-culteral
- 8B: Young & Mobile
- 8C: Prefab
- 8D: Town Seniors
- 9A: First Shared Apartment
- 9B: Temporary Workers
- 9C: Afternoon Talk Show
- 9D: Mini-Jobber
- 9E: Socking Away
- XX: unknown
German CAMEO: Wealth / Life Stage Typology, mapped to international code
- -1: unknown
- 11: Wealthy Households - Pre-Family Couples & Singles
- 12: Wealthy Households - Young Couples With Children
- 13: Wealthy Households - Families With School Age Children
- 14: Wealthy Households - Older Families & Mature Couples
- 15: Wealthy Households - Elders In Retirement
- 21: Prosperous Households - Pre-Family Couples & Singles
- 22: Prosperous Households - Young Couples With Children
- 23: Prosperous Households - Families With School Age Children
- 24: Prosperous Households - Older Families & Mature Couples
- 25: Prosperous Households - Elders In Retirement
- 31: Comfortable Households - Pre-Family Couples & Singles
- 32: Comfortable Households - Young Couples With Children
- 33: Comfortable Households - Families With School Age Children
- 34: Comfortable Households - Older Families & Mature Couples
- 35: Comfortable Households - Elders In Retirement
- 41: Less Affluent Households - Pre-Family Couples & Singles
- 42: Less Affluent Households - Young Couples With Children
- 43: Less Affluent Households - Families With School Age Children
- 44: Less Affluent Households - Older Families & Mature Couples
- 45: Less Affluent Households - Elders In Retirement
- 51: Poorer Households - Pre-Family Couples & Singles
- 52: Poorer Households - Young Couples With Children
- 53: Poorer Households - Families With School Age Children
- 54: Poorer Households - Older Families & Mature Couples
- 55: Poorer Households - Elders In Retirement
- XX: unknown
Number of 1-2 family houses in the microcell
- -1: unknown
- 0: no 1-2 family homes
- 1: lower share of 1-2 family homes
- 2: average share of 1-2 family homes
- 3: high share of 1-2 family homes
- 4: very high share of 1-2 family homes
Number of 3-5 family houses in the microcell
- -1: unknown
- 0: no 3-5 family homes
- 1: lower share of 3-5 family homes
- 2: average share of 3-5 family homes
- 3: high share of 3-5 family homes
- 4: very high share of 3-5 family homes
Number of 6-10 family houses in the microcell
- -1: unknown
- 0: no 6-10 family homes
- 1: lower share of 6-10 family homes
- 2: average share of 6-10 family homes
- 3: high share of 6-10 family homes
Number of 10+ family houses in the microcell
- -1: unknown
- 0: no 10+ family homes
- 1: lower share of 10+ family homes
- 2: high share of 10+ family homes
Most common building type within the microcell
- -1: unknown
- 0: unknown
- 1: mainly 1-2 family homes in the microcell
- 2: mainly 3-5 family homes in the microcell
- 3: mainly 6-10 family homes in the microcell
- 4: mainly 10+ family homes in the microcell
- 5: mainly business buildings in the microcell
Number of buildings in the microcell
- -1: unknown
- 0: unknown
- 1: 1-2 buildings
- 2: 3-4 buildings
- 3: 5-16 buildings
- 4: 17-22 buildings
- 5: >=23 buildings
Distance to nearest urban center
- -1: unknown
- 1: less than 10 km
- 2: 10 - 20 km
- 3: 20 - 30 km
- 4: 30 - 40 km
- 5: 40 - 50 km
- 6: 50 - 100 km
- 7: more than 100 km
Density of households per square kilometer
- -1: unknown
- 1: less than 34 households per km^2
- 2: 34 - 89 households per km^2
- 3: 90 - 149 households per km^2
- 4: 150 - 319 households per km^2
- 5: 320 - 999 households per km^2
- 6: more than 999 households per km^2
Distance to city center (downtown)
- -1: unknown
- 1: in city center
- 2: less than 3 km to city center
- 3: 3 - 5 km to city center
- 4: 5 - 10 km to city center
- 5: 10 - 20 km to city center
- 6: 20 - 30 km to city center
- 7: 30 - 40 km to city center
- 8: more than 40 km to city center
Ratio of residential to commercial activity
- 1: business cell
- 2: mixed cell with high business share
- 3: mixed cell with middle business share
- 4: mixed cell with low business share
- 5: residential cell
Purchasing power in region
- -1; unknown
- 0: unknown
- 1: very high
- 2: high
- 3: average
- 4: low
Movement patterns
- 1: very high movement
- 2: high movement
- 3: middle movement
- 4: low movement
- 5: very low movement
- 6: none
Online affinity
- 0: none
- 1: low
- 2: middle
- 3: high
- 4: very high
- 5: highest
Neighborhood typology
- -1: unknown
- 0: unknown
- 1: upper class
- 2: conservatives
- 3: upper middle class
- 4: middle class
- 5: lower middle class
- 6: traditional workers
- 7: marginal groups
Number of cars in the PLZ8 region
Number of 1-2 family houses in the PLZ8 region
- -1: unknown
- 0: no 1-2 family homes
- 1: lower share of 1-2 family homes
- 2: average share of 1-2 family homes
- 3: high share of 1-2 family homes
- 4: very high share of 1-2 family homes
Number of 3-5 family houses in the PLZ8 region
- -1: unknown
- 0: no 3-5 family homes
- 1: lower share of 3-5 family homes
- 2: average share of 3-5 family homes
- 3: high share of 3-5 family homes
- 4: very high share of 3-5 family homes
Number of 6-10 family houses in the PLZ8 region
- -1: unknown
- 0: no 6-10 family homes
- 1: lower share of 6-10 family homes
- 2: average share of 6-10 family homes
- 3: high share of 6-10 family homes
Number of 10+ family houses in the PLZ8 region
- -1: unknown
- 0: no 10+ family homes
- 1: lower share of 10+ family homes
- 2: high share of 10+ family homes
Most common building type within the PLZ8 region
- -1: unknown
- 0: unknown
- 1: mainly 1-2 family homes
- 2: mainly 3-5 family homes
- 3: mainly 6-10 family homes
- 4: mainly 10+ family homes
- 5: mainly business buildings
Number of households within the PLZ8 region
- -1: unknown
- 1: less than 130 households
- 2: 131-299 households
- 3: 300-599 households
- 4: 600-849 households
- 5: more than 849 households
Number of buildings within the PLZ8 region
- -1: unknown
- 1: less than 60 buildings
- 2: 60-129 buildings
- 3: 130-299 buildings
- 4: 300-449 buildings
- 5: more than 449 buildings
Share of unemployment in community
- -1: unknown
- 1: very low
- 2: low
- 3: average
- 4: high
- 5: very high
- 9: unknown
Size of community
- -1: unknown
- 1: <= 2,000 inhabitants
- 2: 2,001 to 5,000 inhabitants
- 3: 5,001 to 10,000 inhabitants
- 4: 10,001 to 20,000 inhabitants
- 5: 20,001 to 50,000 inhabitants
- 6: 50,001 to 100,000 inhabitants
- 7: 100,001 to 300,000 inhabitants
- 8: 300,001 to 700,000 inhabitants
- 9: > 700,000 inhabitants
Share of unemployment relative to county in which community is contained
- -1: unknown
- 1: very low
- 2: low
- 3: average
- 4: high
- 5: very high
- 9: unknown