ETL Assignment

Data Warehousing
ETL Assignment
ETL is the link between the operational systems in an organization and its data warehouse.
It would be a shame to complete a course in data warehousing without having any idea
what ETL is, not only because of how many jobs involve it, but because it provides the
context for how data warehouses are created and used.
The questions for this assignment are based on those from in the exercise. You may need
to refer to both the schemas and the data itself to develop your answers to these questions:
1) Identify any columns in the transaction processing system (TPS) that do not have a
direct one-to-one correspondence with a column in the star schema. For the columns
you identify, list the table and column names as the appear in the TPS:
Transaction Processing System
Table Column
(Name of table) (Name of column)
(Name of table) (Name of column)
… …
2) Next, identify any columns that appear in the star schema that do not have a direct
one-to-one correspondence with a column in the TPS. List the table and column
names in the TPS and the table and column names in the star schema and why they
appear, such as if they are transformations from columns in the TPS. One simple form
of transformation could be a renaming of a single columns, other transformations
could include splitting a TPS column or combining it with other columns, such as
with an aggregations or calculation.
Since dimDate is an entirely new table that has many columns, for this part you only
need to consider the first three columns in your answer.
The table below shows the format to use along with two lines as examples. The first
shows a column that was pulled from the TPS and appears in the star schema after
being renamed. In indicating table and column names you can either list each
separately (as shown below) or together separated by period
(tablename.columnname) as you’d see in SQL.
The second example shows a column in the star schema that is a calculation based on
two columns from the TPS. Those two columns from different tables in the TPS and
are divided to calculate a value for a column in the star schema.
Star Schema TPS
Table Column Table Column Transformation
(Table name) (Column name) (TPS Table name) (Name of TPS column) (Renamed)
(Table name A)
(Table name B)
(Column name A)
(Column name B)
(TPS Table name)
(Name of TPS column) Calculated from TPS as
Table name A.Column nameA /
Table name B.Column name B
Probably to no surprise, the deliverable for this assignment is a single pdf file uploaded to
Blackboard. After you have the answers completed and written in your document, save it
as a pdf file, upload that file to Blackboard, and you are set.


Prof. Gustavo Vulcano
MSBA Module III − Prof. Gustavo Vulcano
Due Friday October 29th, 2021
Note: Please, submit through Brightspace in a single Word or pdf file. Make sure to include the
managerial problem definition of each of the models below (you can follow the guidelines
provided in the Decision Models course to present a model). The mathematical formulations for
these problems are not required, but it is a good practice to work on them. Also, you will need to
include parts of Excel worksheets as part of the solutions; make sure to copy-paste the
appropriate contents from the Excel file to the submission file (again, only either Word or pdf).
Question #1 (25 points)
Manufacturers in Dalton, GA, produce more than 70% of the total output of the $9 billion
worldwide carpet industry. Competition in this industry is intense and forces producers to strive
for maximum efficiency and economies of scale. It also forces producers to continuously
evaluate investments in new technology.
Kamm Industries is one of the leading carpet producers in the Dalton area. Its owner, Geoff
Kamm, asked for your assistance in planning the production schedule for the next quarter (13
weeks). The company has orders for 15 different types of carpet that can be produced on two
types of looms: Dobbie looms and Pantera looms. Pantera looms produce standard tufted
carpeting. Dobbie looms can also produce standard tufted carpeting but also allow the
incorporation of designs (such as flowers or corporate logos) into the carpeting. The following
table summarizes the orders for each type of carpet that must be produced in the coming quarter
along with their production rates and costs on each type of loom, and the cost of subcontracting
each order. Note that the first 4 orders involved special production requirements that can only be
achieved on a Dobbie loom or via subcontracting. Assume that any portion of an order may be
Demand Dobbie Pantera Subcontract
Carpet (Yds) Yd/Hr Cost/Yd Yd/Hr Cost/Yd Cost/Yd
1 14,000 4.510 $2.66 na na $2.77
2 52,000 4.796 $2.55 na na $2.73
3 44,000 4.629 $2.64 na na $2.85
4 20,000 4.256 $2.56 na na $2.73
5 77,500 5.145 $1.61 5.428 $1.60 $1.76
6 109,500 3.806 $1.62 3.935 $1.61 $1.76
7 120,000 4.168 $1.64 4.316 $1.61 $1.76
8 60,000 5.251 $1.48 5.356 $1.47 $1.59
9 7,500 5.223 $1.50 5.277 $1.50 $1.71
10 69,500 5.216 $1.44 5.419 $1.42 $1.63
11 68,500 3.744 $1.64 3.835 $1.64 $1.80
12 83,000 4.157 $1.57 4.291 $1.56 $1.78
13 10,000 4.422 $1.49 4.558 $1.48 $1.63
14 381,000 5.281 $1.31 5.353 $1.30 $1.44
15 64,000 4.222 $1.51 4.288 $1.50 $1.69
Prof. Gustavo Vulcano
Kamm currently owns and operates 15 Dobbie looms and 80 Pantera looms. To maximize
efficiency and keep pace with demand, the company operates 24/7. Each machine is down for
routine maintenance for approximately 2 hr/week.
a) Formulate a linear programming model for this problem that can be used to determine the
optimal production/subcontracting plan. Make sure to include the managerial problem
definition. Assume that produced and purchased quantities can take continuous values (e.g.,
you could produce 9562.34 yards of a carpet type). Identify:
i. Decision variables
ii. Objective function
iii. All relevant constraints.
Note: There is an Excel template available for this problem.
b) Use Excel Solver to determine the optimal solution. Include a copy of the Solver Answer
Report as part of the main file.
c) By editing the model in (a) and explaining those edits, answer the questions below:
1) What would happen to the total cost if one of the Dobbie machines broke and could not
be used at all during the quarter?
2) What would happen to the total cost if an additional Dobbie machine was purchased and
available for the quarter?
3) What would happen to the total cost if one of the Pantera machines broke and could not
be used at all during the quarter?
4) What would happen to the total cost if an additional Pantera machine was purchased and
available for the quarter?
Prof. Gustavo Vulcano
Question #2 (25 points)
Sharon owns an indoor/outdoor-decorating firm in North Dakota and needs white sand and raw
cotton for a project for one of her biggest customers. She needs 20,000 pounds of white sand and
6,000 pounds of raw cotton. A local supplier can sell her up to 15,000 pounds of white sand for
$0.20 per pound and as much raw cotton as she wants for $0.50 per pound. One of the trucks that
Sharon’s company owns has just made a delivery in Key West, Florida, and is scheduled to
return empty to North Dakota. Sharon has just found out that white sand can be purchased in
Florida for $0.09 per pound and that raw cotton can be purchased in Alabama for $0.36 per
pound. The amount the truck can carry is limited by weight restrictions to 10,000 pounds. Also,
load balancing must be taken into consideration. To ensure proper weight distribution to
maintain stability for the truck, the weight of the sand on the truck must be at least twice the
weight of the raw cotton on the truck. Assume that the additional cost for picking up the sand
and raw cotton and for the increased consumption of diesel fuel for the truck to carry the added
weight can be ignored.
a) Formulate Sharon’s problem as a linear program. Make sure to include the managerial
problem definition. Identify:
i. All decision variables
ii. Objective function
iii. All relevant constraints.
b) Using Excel Solver, optimize the LP model of part (a). Using the Excel Answer Report
identify the optimal procurement plan and all binding constraints. Please, include the Excel
Answer Report table as part of the main file in your solution.
c) Answer the questions below by editing the mathematical model and explain your responses:
1) Should Sharon negotiate a white sand volume beyond 15,000 with the local supplier?
2) How much would Sharon be willing to pay to decrease the project’s white sand
requirement from 20,000 to
i) 10,000 pounds?
ii) 5,000 pounds?
3) Can you quantify the economic cost of the constraint that forces the truck to use (at least)
a 2:1 ratio to balance the load of white sand and raw cotton?
4) Would you recommend Sharon to rent an additional truck to ship white sand from Florida
and raw cotton from Alabama?
Prof. Gustavo Vulcano
Question #3 (30 points)
In November, Jeff Hastings of the fashion skiwear manufacturer Hastings Sportswear, Inc., faces
the task of committing to specific production quantities for each skiwear item the company will
offer in the coming year’s line. Commitments are needed immediately in order to reserve space
in production facilities located throughout Asia. Actual demand for these products will not
become known for at least six months.
Production costs for a typical parka run about 75% of the wholesale price, which in this case is
$110. Unsold parkas can be sold at salvage for around 8% of the wholesale price.
Jeff has asked six of his most knowledgeable people to make forecasts of demand for the various
models of parkas. Forecasts for one product are given in the following table, along with the
average and standard deviation of the forecasts. (Experience suggests that the actual standard
deviation in demand is roughly twice that of the standard deviation in the forecasts.) Based on
this information and assuming that the demand follows a normal distribution, Jeff wants to
evaluate different order quantities for this model and assess expected profits and expected
number of leftover parkas by the end of the season.
Forecaster Assessment
1 900
2 1,000
3 900
4 1,300
5 800
6 1,200
Average 1,017
St. Dev. 194
Noting the previous comment, for the questions below assume that the standard deviation of the
demand is 194×2=388.
PART A: Excel & Crystal Ball
A1) By using Excel and Crystal Ball, formulate a simulation model for this problem, considering
an order quantity for parkas equal to the demand mean. Identify:
i. Assumptions
ii. Forecasts
Make sure to include the managerial problem definition.
Note: When simulating demand, make sure that it does not take negative values by replacing
these negative values with zeroes. There is an Excel template for this problem.
A2) Run at least 5,000 simulations for this model. Include the frequency chart for both forecast
cells. Then, build a 95% confidence interval for both the expected profit and the expected
number of leftover units. Hint: Do not get confused with the 95% certainty provided in the
Crystal Ball forecast chart.
Prof. Gustavo Vulcano
A3) Repeat (b) for two different order quantities: i) Order 814, which is 20% lower than the
quantity in (a), and ii) order 1,220, which is 20% higher. Compare the three expected profit
results and provide the intuition for the best order quantity among the three options
PART B: Python
B1) The script file is a script to run this simulation model in
Python. To this end, you must first install the Anaconda/Spyder framework by following the
instructions provided in the ‘Software info’ tab in Brightspace. It provides the skeleton that
you could use for designing other simulation models.
B2) Run the script and report the mean profit and the standard deviation of the profit.
B3) Edit the script by uncommenting the sketch of the functions
get_confidence_interval_for_mean (to get a 95% confidence interval (CI) for the expected
profit) and get_sample_prob_minx (to compute the probability of having a positive
cashflow). You will have to complete both functions by adding the code needed to perform
the corresponding tasks. Report the results obtained from these functions.
B4) Repeat (B3) for the order quantities 814 and 1,220.
Prof. Gustavo Vulcano
Question #4 (20 points)
Consider now a producer of industrial chemicals that has six manufacturing facilities S1, S2, …,
S6, from which it ships its products to six regions in the country with respective demands D1,
D2, …, D6. The supply-demand setting appears to be quite balanced: the capacity of each plant
is 100 units/day, and the demand at each of the six regions is on average 100 units/day. More
precisely, assume that the demand in each of the regions follows a normal distribution with mean
100 units/day and standard deviation 40 units/day, and that the demands from different regions
are independent.
The company wants to evaluate three shipping configurations:
1) Fully flexible: The demand from any region can be served from the supply from any region.
2) Dedicated: The demand for each region is served only from the corresponding facility; i.e.,
S1 serves D1, S2 serves D2, …, and S6 serves D6.
3) Short chain: The supply-demand points are fully flexible by pairs, as follows:
The company is interested in estimating the total aggregate volume of sales and the probability
of selling the maximum possible number of units (i.e., 600) for each of the configurations.
a) Provide the managerial problem definition.
b) Using Crystal Ball, simulate the total aggregate volume of sales for each of the three
configurations in the same spreadsheet, by using the same demand realizations for the three
configurations. That is, for a given set of demands, determine what is the total volume of
sales under each of the three configurations. Do this for 5,000 trials, and report the
respective approximate 95% confidence intervals for the expected volume of sales. Also,
provide the estimated probability of selling 600 units for each of the three configurations.
Hint: For the 95% confidence interval, do not get confused with the 95% certainty
provided in the Crystal Ball forecast chart. There is an Excel template for this problem.
c) Provide the intuition for the rank of the three configurations with respect to the total volume
of sales.