Analytics
In previous chapters, we learned, created and loaded Finance and Supply chain datasets in to Julia language dataframes, RDBMS database.
We also created a Graph schema, loaded ERP data into Graph schema and created pattern matching Graph SQLs to get data insights.
In this chapter, we'll load Original ERP dataset WITH Graph GSQLs results to run interactive data analytics.
These interactive data anlytics will help us run interactive visualiztion/analysis on following sample scenarios.
what-if, would, could, should
Finance
- Region A is merged with Region B
- Employee resume work from office, how much Travel amounts % will increase.
- % of Office supply expenses given to Employee as home office setup
- would Region A, Cash Flow Investment have returned 7% ROI
- would Region B received Government/investor funding
- could have increased IT operating expenses by 5%
- could have reduced HR temp staff
- should have paid vendor invoiced on time to recive rebate
- should have applied loan to increase production
- should have retired a particular Asset
Supply chain
- during 2020, what UNSPSC category experience highest surge in source orders
- during 2020, in "cleaning, supplies" category, what are the other items sold more than average
- what are most requested items in "cleaning, supplies" category
- what are most requested items in "essentials" category
- what are most requested items in "life savings drugs / GUDID implants" category
- was there an decrease in implants, in other words, were they any cardiac or other critical operational surgery delay during covid
- what are items purchase together with devices in category = ventilators
- was there an increased use items categorized as "controlled substance" items
- what are the items, which were in backlog most frequently and waited longest
GL BalanceSheet, IncomeStatement & CashFlow
Balance Sheet (Interactive)
@manipulate for ld = Dict("Actuals"=> "Actuals", "Budget" => "Budget"),
rg = Dict("Region A"=> "Region A", "Region B" => "Region B", "Region C" => "Region C"),
yr = slider(2020:1:2022; value=2021),
qtr = 1:1:4
@show ld, rg, yr, qtr
select(gdf_plot[(
(gdf_plot.FISCAL_YEAR .== yr)
.&
(gdf_plot.QTR .== qtr)
.&
(gdf_plot.LEDGER .== ld)
.&
(gdf_plot.OPER_UNIT .== rg)
),:],
:OPER_UNIT => :Org,
:FISCAL_YEAR => :FY,
:QTR => :Qtr,
:ACCOUNTS_CLASSIFICATION => :Accounts,
:DEPT_CLASSIFICATION => :Dept,
# :LOCATION_CLASSIFICATION => :Region,
:LOCATION_DESCR => :Loc,
:TOTAL => :TOTAL)
end
using Pkg
Pkg.add("DataFrames")
Pkg.add("Dates")
Pkg.add("CategoricalArrays")
Pkg.add("Interact")
Pkg.add("WebIO")
Pkg.build("WebIO")
using DataFrames, Dates, Interact, CategoricalArrays, WebIO
Pkg.status();
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Building WebIO → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/a8bbcd0b08061bba794c56fb78426e96e114ae7f/build.log`
Status `~/.julia/environments/v1.7/Project.toml`
[336ed68f] CSV v0.10.4
[54eefc05] Cascadia v1.0.2
[324d7699] CategoricalArrays v0.10.6
[8f4d0f93] Conda v1.7.0
[a93c6f00] DataFrames v1.3.4
[31c24e10] Distributions v0.25.64
[e30172f5] Documenter v0.27.19
[8f5d6c58] EzXML v1.1.0
[708ec375] Gumbo v0.8.1
[cd3eb016] HTTP v0.9.17
[7073ff75] IJulia v1.23.3
[c601a237] Interact v0.10.4
[0f8b85d8] JSON3 v1.9.5
[b9914132] JSONTables v1.0.3
[4d0d745f] PDFIO v0.1.13
[c3e4b0f8] Pluto v0.19.9
[2dfb63ee] PooledArrays v1.4.2
[438e738f] PyCall v1.93.1
[88034a9c] StringDistances v0.11.2
[a2db99b7] TextAnalysis v0.7.3
[05625dda] WebDriver v0.1.2
[0f1e0344] WebIO v0.8.18
[fdbf4ff8] XLSX v0.7.10
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
using DataFrames, Dates
# create dummy data
accountsDF = DataFrame(
ENTITY = "Apple Inc.",
AS_OF_DATE=Date("1900-01-01", dateformat"y-m-d"),
ID = 11000:1000:45000,
CLASSIFICATION=repeat([
"OPERATING_EXPENSES","NON-OPERATING_EXPENSES", "ASSETS","LIABILITIES","NET_WORTH","STATISTICS","REVENUE"
], inner=5),
CATEGORY=[
"Travel","Payroll","non-Payroll","Allowance","Cash",
"Facility","Supply","Services","Investment","Misc.",
"Depreciation","Gain","Service","Retired","Fault.",
"Receipt","Accrual","Return","Credit","ROI",
"Cash","Funds","Invest","Transfer","Roll-over",
"FTE","Members","Non_Members","Temp","Contractors",
"Sales","Merchant","Service","Consulting","Subscriptions"],
STATUS="A",
DESCR=repeat([
"operating expenses","non-operating expenses","assets","liability","net-worth","stats","revenue"], inner=5),
ACCOUNT_TYPE=repeat(["E","E","A","L","N","S","R"],inner=5));
# DEPARTMENT Chartfield
deptDF = DataFrame(
AS_OF_DATE=Date("2000-01-01", dateformat"y-m-d"),
ID = 1100:100:1500,
CLASSIFICATION=["SALES","HR", "IT","BUSINESS","OTHERS"],
CATEGORY=["sales","human_resource","IT_Staff","business","others"],
STATUS="A",
DESCR=[
"Sales & Marketing","Human Resource","Infomration Technology","Business leaders","other temp"
],
DEPT_TYPE=["S","H","I","B","O"]);
# LOCATION Chartfield
locationDF = DataFrame(
AS_OF_DATE=Date("2000-01-01", dateformat"y-m-d"),
ID = 11:1:22,
CLASSIFICATION=repeat([
"Region A","Region B", "Region C"], inner=4),
CATEGORY=repeat([
"Region A","Region B", "Region C"], inner=4),
STATUS="A",
DESCR=[
"Boston","New York","Philadelphia","Cleveland","Richmond",
"Atlanta","Chicago","St. Louis","Minneapolis","Kansas City",
"Dallas","San Francisco"],
LOC_TYPE="Physical");
# creating Ledger
ledgerDF = DataFrame(
LEDGER = String[], FISCAL_YEAR = Int[], PERIOD = Int[], ORGID = String[],
OPER_UNIT = String[], ACCOUNT = Int[], DEPT = Int[], LOCATION = Int[],
POSTED_TOTAL = Float64[]
);
# create 2020 Period 1-12 Actuals Ledger
l = "Actuals";
fy = 2020;
for p = 1:12
for i = 1:10^5
push!(ledgerDF, (l, fy, p, "ABC Inc.", rand(locationDF.CATEGORY),
rand(accountsDF.ID), rand(deptDF.ID), rand(locationDF.ID), rand()*10^8))
end
end
# create 2021 Period 1-4 Actuals Ledger
l = "Actuals";
fy = 2021;
for p = 1:4
for i = 1:10^5
push!(ledgerDF, (l, fy, p, "ABC Inc.", rand(locationDF.CATEGORY),
rand(accountsDF.ID), rand(deptDF.ID), rand(locationDF.ID), rand()*10^8))
end
end
# create 2021 Period 1-4 Budget Ledger
l = "Budget";
fy = 2021;
for p = 1:12
for i = 1:10^5
push!(ledgerDF, (l, fy, p, "ABC Inc.", rand(locationDF.CATEGORY),
rand(accountsDF.ID), rand(deptDF.ID), rand(locationDF.ID), rand()*10^8))
end
end
# here is ~3 million rows ledger dataframe
size(ledgerDF)
# rename dimensions columns for innerjoin
df_accounts = rename(accountsDF, :ID => :ACCOUNTS_ID, :CLASSIFICATION => :ACCOUNTS_CLASSIFICATION,
:CATEGORY => :ACCOUNTS_CATEGORY, :DESCR => :ACCOUNTS_DESCR);
df_dept = rename(deptDF, :ID => :DEPT_ID, :CLASSIFICATION => :DEPT_CLASSIFICATION,
:CATEGORY => :DEPT_CATEGORY, :DESCR => :DEPT_DESCR);
df_location = rename(locationDF, :ID => :LOCATION_ID, :CLASSIFICATION => :LOCATION_CLASSIFICATION,
:CATEGORY => :LOCATION_CATEGORY, :DESCR => :LOCATION_DESCR);
# join Ledger accounts chartfield with accounts chartfield dataframe to pull all accounts fields
# join Ledger dept chartfield with dept chartfield dataframe to pull all dept fields
# join Ledger location chartfield with location chartfield dataframe to pull all location fields
df_ledger = innerjoin(
innerjoin(
innerjoin(ledgerDF, df_accounts, on = [:ACCOUNT => :ACCOUNTS_ID], makeunique=true),
df_dept, on = [:DEPT => :DEPT_ID], makeunique=true), df_location,
on = [:LOCATION => :LOCATION_ID], makeunique=true);
# note, how ledger DF has 28 columns now (inclusive of all chartfields join)
size(df_accounts),size(df_dept),size(df_location), size(ledgerDF), size(df_ledger)
function periodToQtr(x)
if x ∈ 1:3
return 1
elseif x ∈ 4:6
return 2
elseif x ∈ 7:9
return 3
else return 4
end
end
# now we will use this function to transform a new column
transform!(df_ledger, :PERIOD => ByRow(periodToQtr) => :QTR)
# let's create one more generic function, which converts a number to USD currency
function numToCurrency(x)
return string("USD ",round(x/10^6; digits = 2), " million")
end
transform!(df_ledger, :POSTED_TOTAL => ByRow(numToCurrency) => :TOTAL)
df_ledger[1:5,["POSTED_TOTAL","TOTAL"]]
"df_ledger_size after transformation is: ", size(df_ledger)
("df_ledger_size after transformation is: ", (2800000, 30))
Income Statement (Interactive)
@manipulate for ld = Dict("Actuals"=> "Actuals", "Budget" => "Budget"),
rg = Dict("Region A"=> "Region A", "Region B" => "Region B", "Region C" => "Region C"),
yr = slider(2020:1:2022; value=2021),
qtr = 1:1:4
@show ld, rg, yr, qtr
select(gdf_plot[(
(gdf_plot.FISCAL_YEAR .== yr)
.&
(gdf_plot.QTR .== qtr)
.&
(gdf_plot.LEDGER .== ld)
.&
(gdf_plot.OPER_UNIT .== rg)
.&
(in.(gdf_plot.ACCOUNTS_CLASSIFICATION, Ref(["ASSETS", "LIABILITIES", "REVENUE","NET_WORTH"])))
),:],
:OPER_UNIT => :Org,
:FISCAL_YEAR => :FY,
:QTR => :Qtr,
:ACCOUNTS_CLASSIFICATION => :Accounts,
:DEPT_CLASSIFICATION => :Dept,
# :LOCATION_CLASSIFICATION => :Region,
:LOCATION_DESCR => :Loc,
:TOTAL => :TOTAL)
end
Cash Flow Statement (Interactive)
@manipulate for ld = Dict("Actuals"=> "Actuals", "Budget" => "Budget"),
rg = Dict("Region A"=> "Region A", "Region B" => "Region B", "Region C" => "Region C"),
yr = slider(2020:1:2022; value=2021),
qtr = 1:1:4
@show ld, rg, yr, qtr
select(gdf_plot[(
(gdf_plot.FISCAL_YEAR .== yr)
.&
(gdf_plot.QTR .== qtr)
.&
(gdf_plot.LEDGER .== ld)
.&
(gdf_plot.OPER_UNIT .== rg)
.&
(in.(gdf_plot.ACCOUNTS_CLASSIFICATION, Ref(["NON-OPERATING_EXPENSES","OPERATING_EXPENSES"])))
),:],
:OPER_UNIT => :Org,
:FISCAL_YEAR => :FY,
:QTR => :Qtr,
:ACCOUNTS_CLASSIFICATION => :Accounts,
:DEPT_CLASSIFICATION => :Dept,
# :LOCATION_CLASSIFICATION => :Region,
:LOCATION_DESCR => :Loc,
:TOTAL => :TOTAL)
end
Supplychain Material Service Request
sampleSize = 1000 # number of rows, scale as needed
dfMSR = DataFrame(
UNIT = rand(dfOrgMaster.UNIT, sampleSize),
MSR_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
FROM_UNIT = rand(dfOrgMaster.UNIT, sampleSize),
TO_UNIT = rand(dfOrgMaster.UNIT, sampleSize),
GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
QTY = rand(dfOrgMaster.UNIT, sampleSize));
first(dfMSR, 5)
ITEM/Vendor Master Voucher Invoice data
sampleSize = 1000 # number of rows, scale as needed
dfVCHR = DataFrame(
UNIT = rand(dfOrgMaster.UNIT, sampleSize),
VCHR_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
STATUS=rand(["Closed","Paid","Open","Cancelled","Exception"], sampleSize),
VENDOR_INVOICE_NUM = rand(10001:9999999, sampleSize),
VENDOR=rand(unique(dfVendor.companyName), sampleSize),
GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
QTY = rand(1:150, sampleSize),
UNIT_PRICE = rand(Normal(100, 2), sampleSize)
);
show(first(dfVCHR, 5),allcols=true)
Supply chain Shipment Receipt data
sampleSize = 1000 # number of rows, scale as needed
dfSHIPRECEIPT = DataFrame(
UNIT = rand(dfOrgMaster.UNIT, sampleSize),
SHIP_DATE=rand(collect(Date(2020,1,1):Day(1):Date(2022,5,1)), sampleSize),
STATUS=rand(["Shippped","Returned","In process","Cancelled","Exception"], sampleSize),
SHIPMENT_NUM = rand(10001:9999999, sampleSize),
CUSTOMER=rand(unique(dfVendor.companyName), sampleSize),
GUDID = rand(dfGUDIDdevice.PrimaryDI, sampleSize),
QTY = rand(1:150, sampleSize),
UNIT_PRICE = rand(Normal(100, 2), sampleSize)
);
show(first(dfSHIPRECEIPT, 5),allcols=true)
Supply chain Procurement Pipeline
## Ledger Visual
# plot_data = gdf_plot[(
# (gdf_plot.FISCAL_YEAR .== yr_p)
# .&
# (gdf_plot.LEDGER .== ld_p)
# .&
# (gdf_plot.OPER_UNIT .== rg_p)
# .&
# (gdf_plot.LOCATION_DESCR .== ldescr)
# .&
# (gdf_plot.DEPT_CLASSIFICATION .== ddescr)
# .&
# (gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
# , :];
# # @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by # $yr_p for $rg_p")
# @df plot_data plot(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)",
# label=[
# "$ld_p by $yr_p for $rg_p $ldescr $adescr $ddescr"
# ],
# lw=3)
## Actuals vs Budget comparison
# plot_data_a = gdf_plot[(
# (gdf_plot.FISCAL_YEAR .== yr_p)
# .&
# (gdf_plot.LEDGER .== "Actuals")
# .&
# (gdf_plot.OPER_UNIT .== rg_p)
# .&
# (gdf_plot.LOCATION_DESCR .== ldescr)
# .&
# (gdf_plot.DEPT_CLASSIFICATION .== ddescr)
# .&
# (gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
# , :];
# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by # $yr_p for $rg_p")
# plot_data_b = gdf_plot[(
# (gdf_plot.FISCAL_YEAR .== yr_p)
# .&
# (gdf_plot.LEDGER .== "Budget")
# .&
# (gdf_plot.OPER_UNIT .== rg_p)
# .&
# (gdf_plot.LOCATION_DESCR .== ldescr)
# .&
# (gdf_plot.DEPT_CLASSIFICATION .== ddescr)
# .&
# (gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
# , :];
# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p # for $rg_p")
# @df plot_data_a plot(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)",
# label=[
# "Actuals by $yr_p for $rg_p $ldescr $adescr $ddescr"
# ],
# lw=3)
# @df plot_data_b plot!(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)",
# label=[
# "Budget by $yr_p for $rg_p $ldescr $adescr $ddescr"
# ],
# lw=3)
# plot_data
# plot_data = gdf_plot[(
# (gdf_plot.FISCAL_YEAR .== yr_p)
# .&
# (gdf_plot.LEDGER .== ld_p)
# .&
# (gdf_plot.OPER_UNIT .== rg_p)
# .&
# (gdf_plot.LOCATION_DESCR .== ldescr)
# .&
# (gdf_plot.DEPT_CLASSIFICATION .== ddescr)
# .&
# (gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
# , :];
# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p for $rg_p")
# @df gdf_plot plot(:QTR, :ACCOUNTS_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)",
# label=[
# "$ld_p by $yr_p for $rg_p $ldescr $adescr $ddescr"
# ],
# lw=3)
# @df gdf_plot scatter(:QTR, :ACCOUNTS_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)",
# label=[
# "$ld_p by $yr_p for $rg_p $ldescr for $ddescr"
# ],
# lw=3)
# plot_data = gdf_plot[(
# (gdf_plot.FISCAL_YEAR .== yr_p)
# .&
# (gdf_plot.LEDGER .== ld_p)
# .&
# (gdf_plot.OPER_UNIT .== rg_p)
# .&
# (gdf_plot.LOCATION_DESCR .== ldescr)
# .&
# (gdf_plot.DEPT_CLASSIFICATION .== ddescr)
# .&
# (gdf_plot.ACCOUNTS_CLASSIFICATION .== adescr))
# , :];
# @df plot_data scatter(:QTR, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)", label="$ld_p Total by $yr_p for $rg_p")
# @df gdf_plot plot(:QTR, :ACCOUNTS_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)",
# label=[
# "$ld_p by $yr_p for $rg_p $ldescr $adescr $ddescr"
# ],
# lw=3)
# @df gdf_plot scatter(:QTR, :DEPT_CLASSIFICATION, :TOTAL/10^8, title = "Finance Ledger Data", xlabel="Quarter", ylabel="Total (in USD million)",
# label=[
# "$ld_p by $yr_p for $rg_p $ldescr for $adescr"
# ],
# lw=3)