p-value, null hypothesis and real time analytics (onlinestat)

In this section, we will see an example how to perform null hypthoses/p-value analysis on Ledger data.

use case

Finance Ledger

Finance Ledger , Balance Sheet, Income Statement and Cash Flow

below is sample Finance Ledger Data, in this Ledger data, we will run p-value to test following Hypothesis.

For a Given Given FISCALY_YEAR and ACCOUNTING_PERIOD, OPERATING EXPENSES are aligned (10%) tolerance range in comparison to BEFORE or AFTER FISCAL_YEAR & ACCOUNTING_PERIOD.

julia> using DataFrames, Plots, Dates
julia> # create dummy data accounts = DataFrame(AS_OF_DATE=Date("1900-01-01", dateformat"y-m-d"), ID = 11000:1000:45000, CLASSIFICATION=repeat([ "OPERATING_EXPENSES","NON-OPERATING_EXPENSES", "ASSETS","LIABILITIES", "NET_WORTH","STATISTICS","REVENUE" ], inner=5), CATEGORY=[ "Travel","Payroll","non-Payroll","Allowance","Cash", "Facility","Supply","Services","Investment","Misc.", "Depreciation","Gain","Service","Retired","Fault.", "Receipt","Accrual","Return","Credit","ROI", "Cash","Funds","Invest","Transfer","Roll-over", "FTE","Members","Non_Members","Temp","Contractors", "Sales","Merchant","Service","Consulting","Subscriptions" ], STATUS="A", DESCR=repeat([ "operating expenses","non-operating expenses", "assets","liability","net-worth","stats","revenue" ], inner=5), ACCOUNT_TYPE=repeat([ "E","E","A","L","N","S","R" ],inner=5));
julia> dept = DataFrame(AS_OF_DATE=Date("2000-01-01", dateformat"y-m-d"), ID = 1100:100:1500, CLASSIFICATION=[ "SALES","HR", "IT","BUSINESS","OTHERS" ], CATEGORY=[ "sales","human_resource","IT_Staff","business","others" ], STATUS="A", DESCR=[ "Sales & Marketing","Human Resource","Infomration Technology","Business leaders","other temp" ], DEPT_TYPE=[ "S","H","I","B","O"]);
julia> location = DataFrame(AS_OF_DATE=Date("2000-01-01", dateformat"y-m-d"), ID = 11:1:22, CLASSIFICATION=repeat([ "Region A","Region B", "Region C"], inner=4), CATEGORY=repeat([ "Region A","Region B", "Region C"], inner=4), STATUS="A", DESCR=[ "Boston","New York","Philadelphia","Cleveland","Richmond", "Atlanta","Chicago","St. Louis","Minneapolis","Kansas City", "Dallas","San Francisco"], LOCA_TYPE="Physical");
julia> ledger = DataFrame( LEDGER = String[], FISCAL_YEAR = Int[], PERIOD = Int[], ORGID = String[], OPER_UNIT = String[], ACCOUNT = Int[], DEPT = Int[], LOCATION = Int[], POSTED_TOTAL = Float64[] );
julia> # create 2020 Period 1-12 Actuals Ledger l = "Actuals";
julia> fy = 2020;
julia> for p = 1:12 for i = 1:10^5 push!(ledger, (l, fy, p, "ABC Inc.", rand(location.CATEGORY), rand(accounts.ID), rand(dept.ID), rand(location.ID), rand()*10^8)) end end
julia> # create 2021 Period 1-4 Actuals Ledger l = "Actuals";
julia> fy = 2021;
julia> for p = 1:4 for i = 1:10^5 push!(ledger, (l, fy, p, "ABC Inc.", rand(location.CATEGORY), rand(accounts.ID), rand(dept.ID), rand(location.ID), rand()*10^8)) end end
julia> # create 2021 Period 1-4 Budget Ledger l = "Budget";
julia> fy = 2021;
julia> for p = 1:12 for i = 1:10^5 push!(ledger, (l, fy, p, "ABC Inc.", rand(location.CATEGORY), rand(accounts.ID), rand(dept.ID), rand(location.ID), rand()*10^8)) end end
julia> ledger[:,:]2800000×9 DataFrame Row │ LEDGER FISCAL_YEAR PERIOD ORGID OPER_UNIT ACCOUNT DEPT ⋯ │ String Int64 Int64 String String Int64 Int64 ⋯ ─────────┼────────────────────────────────────────────────────────────────────── 1 │ Actuals 2020 1 ABC Inc. Region B 34000 1400 ⋯ 2 │ Actuals 2020 1 ABC Inc. Region B 16000 1500 3 │ Actuals 2020 1 ABC Inc. Region A 13000 1300 4 │ Actuals 2020 1 ABC Inc. Region C 45000 1200 5 │ Actuals 2020 1 ABC Inc. Region A 29000 1500 ⋯ 6 │ Actuals 2020 1 ABC Inc. Region B 18000 1300 7 │ Actuals 2020 1 ABC Inc. Region B 11000 1300 8 │ Actuals 2020 1 ABC Inc. Region C 22000 1400 ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 2799994 │ Budget 2021 12 ABC Inc. Region B 28000 1500 ⋯ 2799995 │ Budget 2021 12 ABC Inc. Region B 39000 1100 2799996 │ Budget 2021 12 ABC Inc. Region A 32000 1300 2799997 │ Budget 2021 12 ABC Inc. Region B 32000 1300 2799998 │ Budget 2021 12 ABC Inc. Region A 28000 1300 ⋯ 2799999 │ Budget 2021 12 ABC Inc. Region A 33000 1300 2800000 │ Budget 2021 12 ABC Inc. Region C 11000 1300 2 columns and 2799985 rows omitted

p-value function

getPValue(ledger, ACCOUNTCLASSIFICATION = "OPERATINGEXPENSES", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given ACCOUNT NODE

getPValue(ledger, ACCOUNTCLASSIFICATION = "OPERATINGREVENUE", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given ACCOUNT NODE

getPValue(ledger, ACCOUNTCLASSIFICATION = "OPERATINGASSETS", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given ACCOUNT NODE

getPValue(ledger, DEPTCLASSIFICATION = "OPERATINGEXPENSES", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given DEPT NODE

getPValue(ledger, DEPTCLASSIFICATION = "OPERATINGREVENUE", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given DEPT NODE

getPValue(ledger, DEPTCLASSIFICATION = "OPERATINGASSETS", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given DEPT NODE

getPValue(ledger, REGIONCLASSIFICATION = "OPERATINGEXPENSES", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given REGION NODE

getPValue(ledger, REGIONCLASSIFICATION = "OPERATINGREVENUE", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given REGION NODE

getPValue(ledger, REGIONCLASSIFICATION = "OPERATINGASSETS", FISCALYEAR =2021, ACCOUNTINGPERIOD = 7)

return p-value for given REGION NODE