Final Project – Building a Data-Driven Python Application
Your challenge in this assignment is to develop an interactive data-driven web-based Python application that shows your mastery of many coding concepts as you interact with data real world data. You will use Pandas and NumPy modules for managing and interacting with data, MatPlotLib or Pandas charts for plotting, and the Streamlit.io package for creating interactive web applications using Python.
Interact with Real-World Data
Choose one of these data sets:
National Parks in New York Description
Download CSV (from data.gov)
Colleges and Universities in the United States Description
Download CSV (from data.gov)
New York City Vehicle Collisions, 2015-present Description
Download CSV (sample data)
Volcanic Eruptions Description
Download CSV (cleaned data)
Used cars for sale on Craigslist Description
Download CSV (sampled data)
Skyscrapers around the World Description
Download CSV (adapted from Wikipedia)
To ensure students create a variety of projects, you will sign up to use the data set you wish during class. If you miss class, or if the signups are not approximately equally distributed, I will assign a data set for you to use.
Demonstrate Your Python Coding Skills
Your Python code should demonstrate your Python coding skills as you implement several concepts that we studied throughout the course that appropriate for your project, such as:
• Coding Fundamentals: data types, if statements, loops, formatting, etc.
• Data Structures: Interact with lists, tuples, dictionaries (keys, values, items)
• Functions: passing positional and optional arguments, returning values
• Files: Reading data from a CSV file into a DataFrame
• statistics or pandas module functions for calculating mean, median, etc.
• MatPlotLib or pandas for creating different types of charts
• StreamLit.io for making interactive applications, displaying charts and maps using UI widgets for input
• numpy functions for interacting with arrays (such as np.arange)only if we cover in class
• pandas DataFrames for interacting and manipulating large data sets using filtering, sorting, pivot tables, etc.
Part 1. Design
The purpose of this part is to get you thinking about what you might do before you start coding. Identify at least two different queries or questions you can ask about your data set and ways to interact with and present the data based on your understanding of pandas DataFrames, matPlotLib, and the streamlit.io packages. Use a combination of charts, graphs maps, word clouds, or other presentation tools.
Describe how your queries will be interactive by incorporating streamlit’s user interface elements to obtain user input. Describe how you will visually present this data using charts, graphs, Streamlit tables or maps. For example, if analyzing housing data, you might use a dropdown list to specify a list of neighborhoods and a slider to specify a price range. You then might display all rooms for rent in that neighborhood within that price range using a table, chart, or map. (That’s an easy one. At least one of your queries needs to be more complex than this!)
Try to make your page as “user friendly” and as “polished” as possible. Include labels on all controls requiring user interaction, make sure your charts have titles, legends or explanations that would be helpful to the user. Don’t use default display values, but customize your charts so they look nice!
Create a Word document describing your plans. Submit it on Blackboard. I will respond within 24 hours on Blackboard approving your proposed questions or making suggestions if they appear to be too complicated or too easy.
You may change your queries or visualizations after you start coding if you need to change your plans. If you do this, please notify me during the coding week.
Feel free to add to your project as you explore pandas and streamlit capabilities and find cool ways to implement new or additional features. Part of your grade will be a “complexity/originality” score. If you use a module or do something cool that we may not have discussed in class or implement more than the minimum requirements, you will receive a higher score. A zero-complexity score means you implemented the minimum requirements for this project.
Part 2. Code.
Create your Python application with a Streamlit UI and the various visualizations. Create at least two different charts, graphs of different types with custom legends, axis labels, tick marks, colors, other features), or a map showing latitude and longitude. Be sure to include appropriate context or labels in your user interface to cue the reader about which values to specify, and the purpose of each chart or graph. You may wish to add a few sentences explaining each chart. Place all UI controls in the left sidebar, and your visualizations in the main content area. Make your application as professional looking as you can.
Post your code to Blackboard before the start of our last class.
As you write your code, be sure to demonstrate your mastery of these capabilities in your project:
• At least one function that has two parameters and returns a value
• At least one function that does not return a value
• Interacting with dictionaries, lists, and tuples
• Using a Python module to calculate a statistical function such as average, median, mode, etc.
• User Interface and dashboard with Streamlit.io
Your code should demonstrate your mastery of at least three Pandas capabilities as appropriate for your queries and data. These include:
• Sorting data in ascending or descending order, multi-column sorting
• Filtering data by one or more conditions
• Analyzing data with pivot tables
• Managing rows or columns
• Add/drop/select/create new/group columns, frequency count, other features as you wish
Usual rules about writing “good” code apply:
• Make your code as modular and easy to follow as possible.
• Include a docstring, comments, and meaningful variable names.
• If you did something “cool” in your code that you are incredibly proud of, please write a comment call attention to what you did.
• If you referred to any online articles or other information beyond class examples, please be sure to list them as references in your code.
• Make sure the program runs and the output is correct.
Part 3. Present.
Plan to present your project in class during the last week, showing both a demonstration of your project running in the browser and then describing at least one section of the code that you wrote of which you are most proud! Show what you feel is the most interesting part of your project. Then display the code and explain the pandas and Streamlit code well enough to convince me that you understand how your code works and what you did.
Part 4. Publish (Extra Credit)
Publish your application to the web by following these Streamlit Sharing instructions. This is a newly released feature. It may take a few days before your request is filled, so sign up for the invite now! As an alternative, you can deploy it to a server on Heroku by following these instructions .
This project counts toward 15 percent of your final course grade for the course and is based on 50 points, as follows:
Project—Proposal, Design and Queries submitted on time 4
Design—User Interface: at least three Streamlit UI controls (page has a professional appearance) 9
Coding—At least one function with at least two parameters that returns a value 2
Coding—At least one function with a default parameter called more than once in your application (one time with the default value, and one time without) 2
Coding—Interacting with a dictionary * 3
Coding—Interacting with lists or list comprehension* 3
Coding— Pandas Features: at least three (sort, filter, multiple conditions filter, pivot table, etc.) 12
Coding— At least two different charts with custom legends, labels, tick marks, titles, colors, other features. At most one can be a map. 10
Coding— Well documented, efficient, modular 2
Project— Complexity compared to other student projects
0 = Your project implements less than the minimum requirements
1 = Your project implements the minimum requirements
2 = Your project includes some complex queries, charts, or UI features
3 = You went above and beyond in requirements, ether doing more than what is required, or by including features, modules, or packages learned independently or not described in class 3
Extra Credit: Publish Your Project on the Web 5
• This is a final project, so please do not discuss your program with anyone other than your instructor.
• You can ask CIS Sandbox tutors for assistance on related or general topics, but you cannot ask them to help you write your code for this project. For example, you can ask tutors to help review examples of how to create bar charts in Python (in general), but you cannot ask them to help you debug a bar chart you might create using the data set for this project.
• You can ask CIS Sandbox tutors for help with fixing syntax or runtime errors.