You are an independent real estate agent operating in Murray, KY area. Before you start your business, you want to do obtain a complete list of all real estate listings in Murray and to decide which areas of Murray to concentrate on.
1. Read the following paper by Hadley Wickham on Tidy Data: http://vita.had.co.nz/papers/tidy-data.pdf . Please don’t worry if some of the things discussed in the paper are not very clear. All you need to do is to discuss the general criteria that a data set needs to meet in order to be judged to be “tidy”. Please list and briefly discuss these criteria here [20%]:
Three criteria that are used to judge whether the dataset is tidy is assess to variables that is variable are column headers, each observation forms the rows and each type of the observation unit forms the table. Each column has a variable name that measures the underlying attribute across an observation. Rows represent every fact from the ground leading to assigning every value to a variable name and an observation. Combing this kind of a structure leads to a table of observation that conforms to a tidy dataset. Focusing on this kind of a single dataset using statistical language helps in analyzing and interpretation of results.
2. Go to trulia.com website and search for all real estate listings in Murray, KY (hint: search by city and state). There should be several hundred listings. You need to download all the real estate listings and put them into an Excel sheet. Please make sure you dataset in Excel meets the “tidy data” requirements you discussed earlier. Please make sure you save this file for later use in the course. While you can certainly do the data import manually, you would certainly be better off using some sort of a “web scrapping” tool. In the space provided below, please explain how you downloaded, cleaned and imported this data into Excel [40%]:
The web scrapping tools used is the import.io that converts web pages into data. An application program interface automatically captures data from a web page into a table. The table is in excel format was downloaded into the documents folder.
Cleaning of the data was based on the questions given below which requires only name of the real estate and the price. The variables are real state and price. This resulted to eliminating all columns apart from real estate name and the price attached to the estate.
3. Use Excel to answer the following questions in relation to the dataset you obtained in the previous step [20%]:
Q1: What is the average (mean, median) price of a house in Murray, KY?
The average mean price of houses in Murray KY is $210,813.9 and the median price is $145,900.
Q2: What is the range of house prices in Murray, KY?
Range of the house prices is $10000 to $ 2,575,000.
Q3: What is the most expensive area of Murray based on housing prices? Please explain your answer. Hint: you need to find a meaningful way to divided house listings into meaningful subsets.
The most expensive area in Murray KY is the 318-weekend area.
Q4: What is the least expensive area of Murray based on housing prices? Please explain your answer. Hint: you need to find a meaningful way to divided house listings into meaningful subsets.
The least expensive in Murray KY is lots 12 Brookfield area
4. Import the dataset from your Excel sheet into a Microsoft Access table. Please make sure you specify the data types for all the attributes. In the space provided below please explain briefly how you did the import/export [20%]:
Importation of the real estate dataset was done by using the external data button on the menu bar and clicking on the excel icon of Microsoft Access interface. Selection of the dataset was done appropriately using the location file path of the dataset. Variable attributes for real estate variable is text, price data type is currency and the primary keys generated automatically by the Micrsoft Access..
Please submit this file together with your Excel and Access files via Canvas in one zip file named using the convention specified earlier