Manual/API

GeneralLedger.getArrangedWordsMethod

getArrangedWords(str) Call this function to remove duplicates, unwanted symbols and return uppercase unique values

Example

julia> wd = getArrangedWords("Amit ; Shukla SHUKLA Shukle , . AmIT Amit # Shuklam Amit ,")
source
GeneralLedger.getCategoryDataMethod

getCategoryData(df_dname:: DataFrame, colName:: AbstractString, categoryRange) Call this function to create a new column on DataFrame which provide a category based on ranges provided.

column must be contain only numerical values.

First parameter is the DataFrame, next is Column name for which categories are created, followed by Ranges.

Function returns dataframe with an extra columns with Categories like 1,2,3 and 0 for unmatched.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> catData = getCategoryData(df_dname, "age", [0:10,10:20,20:30])
source
GeneralLedger.getDBConnectionMethod
getDBConnection(credFilePath:: AbstractString)

Call this function to read database credentials

database credentials are stored in environment.txt file in following format.

environment.txt

user=username
pwd=password
dsn=userdsnname
hive=hdinsightstr
port=portnumber

Example

julia> fl = getDBConnection("environment.txt")
source
GeneralLedger.getDSNsMethod
getDSNs()

Call this function to read computing machine available DSNs

Example

julia> dsns = getDSNs()
source
GeneralLedger.getDriversMethod
getDrivers()

Call this function to read computing machine available Drivers

Example

julia> drivers = getDrivers()
source
GeneralLedger.getDuplicateRowsMethod

getDuplicateRows(df_dname:: DataFrame, colNames:: Vector) Call this function to find duplicates in a data frame column based on columnnames (key columns)

First parameter is the DataFrame, followed by all columnnames in DataFrame which are key columns in a dataframe.

Function returns dataframe row indexes which are duplicates based on key columns provided.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> dup = getDuplicateRows(df_dname, ["name","age"])
source
GeneralLedger.getFileMethod
getFile(url::AbstractString, downloadPath:: AbstractString)

Call this function to download a file

first parameter is webpage url, next parameter is output directory path including file name

Example

julia> fl = getFile("https://s2.q4cdn.com/470004039/files/doc_financials/2021/q2/FY21-Q2-Consolidated-Financial-Statements.pdf", "<folder_name>/test.csv")
source
GeneralLedger.getFuzzyWuzzyMethod

getFuzzyWuzzy(str) Call this function to find closest match for a given string in data frame column lookup

First parameter is the search string, second is the DataFrame followed by columnname in DataFrame which needs to be searched

Example

julia> df_dname = DataFrame(name=["John Doe", "Jen Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,26,35,10,5,45])
julia> wd = getFuzzyWuzzy("Mike Jackson", df_dname, "name")
source
GeneralLedger.getJSONintoDataFrameMethod
getJSONintoDataFrame(path::AbstractString)

Call this function to read json file from url and retrieve results into DataFrame

first parameter is webpage url, next parameter is output directory path

Example

julia> df = getJSONintoDataFrame("https://api.coindesk.com/v1/bpi/currentprice.json", "downloads/web")
source
GeneralLedger.getKeyColumnsMethod

getKeyColumns(df_dname:: DataFrame) Call this function to find key columns in a data frame

First parameter is the DataFrame

Function returns dataframe columns indexes which are key columns.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> kcols = getKeyColumns(df_dname)
source
GeneralLedger.getMaskedDataMethod

getMaskedData(df_dname:: DataFrame, colNames:: Vector) Call this function to create a flatten tree data structure.

First parameter is the DataFrame, next is Column names which needs to be masked

Function returns a two dataframes, one with masked data and other with masked + original data.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> catData = getMaskedData(df_dname, "name")
source
GeneralLedger.getPullFilesMethod
getPullFiles(path::AbstractString)

Call this function to read urls in local txt files line by line, and download each file

Example

julia> fl = getPullFiles("c:mit.lailename.txt")
source
GeneralLedger.getSQLsMethod
getSQLs(credFilePath:: AbstractString)

Call this function to read sqls for a given table from txt file. SQLs are stored in txt file in this format.

sqls.txt

createTable1=INSERT INTO table1 (column1) SELECT table2.column1 FROM table2 WHERE table2.column1 > 100;
readTable1=SELECT * FROM table1;
updateTable1=UPDATE table SET column1 = value1, column2 = value2, ... WHERE condition;
upsertTable1=BEDIN tran IF EXISTS (SELECT * FROM table1 WITH (updlock,serializable) WHERE key = @key) BEGIN UDPATE table1 SET ... WHERE key = @key END ELSE BEGIN INSERT INTO table1 (key, ...) VALUES (@key, ...) END COMMIT TRAN
softDeleteTable1=UPDATE table SET deleted=True, ... WHERE condition;
hardDeleteTable1=delete * from table1 where table1.column1 > 100

Example

julia> fl = getSQLs("environment.txt")
source
GeneralLedger.getSampleBDepositMethod
getSampleBDeposit(d:: Deposit)

Call this function to calculate simple yearly interest.
    principal::T # =10000 # Principal amount
    rate::T # =3.875  Rate of Interest
    compound::T # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
    time::T # number of deposit months

function getSampleBDeposit(d) # buddy deposit
    P = d.principal
    n = d.compound # simple interest calculated per year
    r = d.rate/100;
    t = d.time/12;
    return P * (1 + (r * t) / n) - P, P * (1 + (r * t) / n) # interest, total
end
source
GeneralLedger.getSampleCDepositMethod
getSampleCDeposit(d:: Deposit)
Call this function to calculate accumulated compound interest.

P::Int64=10000 # Principal amount
r::Float64=3.875 # Rate of Interest
n::Int64=1, # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
t::Int64=60 # number of deposit months

function getSampleCDeposit(d::Deposit) # certificate deposit
    P = d.principal
    n = d.compound
    r = d.rate / 100
    t = d.time / 12
    return P * (1 + r / n)^(n * t) - P, P * (1 + r / n)^(n * t)# interest, total
end
source
GeneralLedger.getSampleDataTimeTakenFunction
getSampleDataTimeTaken(sampleSize::Int64=10)

Call this function to produce sample datasets showing Timetaken by vehicles.

Time taken to travel between two cities via Air, Bus, Train or personal vehicle depends mostly depends on speed and distance. However, there are other factors like weather, season, population or faults, which may occasionally impact travel time. However, knowing intensity of these factors, it's still possible to predict time taken rationally.

There are many assumptions when calculating Time taken, like, Passenger needs to travel to bus, train station or time to get ready, wait time etc. is not reflected here. Let's capture these assumptions as in column "bias" for now.

source
GeneralLedger.getSampleDepositsDataFunction
getSampleDepositsData(sampleSize::Int64=10, P::Int64=10000, r::Float64=3.875, n::Int=1, t::Int=60)

Call this function to produce sample datasets showing deposits.
please see, this function generate only sample data with bias (intentionally biased and wrong, so that a pattern can be analyzed)
and must not be used for any real calculations.

sampleSize::Int64=10 # number of rows generated
P::Int64=10000 # Principal amount
r::Float64=3.875 # Rate of Interest
n::Int=1, # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
t::Int=60 # number of deposit months
source
GeneralLedger.getSampleRDepositMethod
getSampleRDeposit(d:: Deposit)
Call this function to calculate Random Interest return.

P::Int=10000 # Principal amount
r::Float64=3.875 # Rate of Interest
n::Int=1, # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
t::Int=60 # number of deposit months

function getSampleRDeposit(d::Deposit) # random deposit
    P = d.principal
    x = rand(-0.51:0.02:0.99)
    return P * (1 + x) - P, P * (1 + x) # interest, total
end
source
GeneralLedger.getTokensMethod

getTokens(s, token_type) Call this function to extract tokens from string (example - extract urls)

First parameter is the complete string text, next parameter is list of words to be removed.

Example

julia> wd = getTokens("https://yahoo.com is the Yahoo website url", url)
source
GeneralLedger.getTreeDataMethod

getTreeData(df_dname:: DataFrame, colName:: AbstractString) Call this function to create a flatten tree data structure.

First parameter is the DataFrame, next is Column name for which tree hierachies (flattened) are created.

Function returns a new dataframe with Levels.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> catData = getTreeData(df_dname, "state")
source
GeneralLedger.getWebLinksMethod
getWebLinks(url::AbstractString, fileTypes::AbstractVector, downloadPath:: AbstractString)

Call this function to crawl through a web page and download all file links

first parameter is webpage url, next parameter is list of all file extensions user wish to download followed by output directory path

Example

julia> fl = getWebLinks("https://investor.apple.com/investor-relations/default.aspx#tabs_content--2021", ["pdf","csv","xlsx","xls"], "downloads/web")
source
GeneralLedger.getXMLintoDataFrameMethod
getXMLintoDataFrame(path::AbstractString)

Call this function to read XML file from url and retrieve results into DataFrame

first parameter is webpage url, next parameter is output directory path

Example

julia> df = getXMLintoDataFrame("https://api.coindesk.com/v1/bpi/currentprice.json", "downloads/web")
source
GeneralLedger.runSQLMethod
runSQL(conn:: AbstractString, sql:: AbstractString)

Call this function to run sql in database

Example

julia> res = runSQL(conn, SQLStatement)
source
GeneralLedger.setColNamesMethod
setColNames(str)

Call this function to update column names

remove blank spaces, dollar sign, or Hash chars

make all columns uppercase and replace hyphen with underscore

source
GeneralLedger.setRemDuplicateRowsMethod

setRemDuplicateRows(df_dname:: DataFrame, colNames:: Vector) Call this function to find & delete duplicates in a data frame column based on columnnames (key columns)

First parameter is the DataFrame, followed by all columnnames in DataFrame which are key columns in a dataframe.

Function returns dataframe after removing duplicates based on key columns provided. in case of duplicates, it retains first row.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> dup = setRemDuplicateRows(df_dname, ["name","age"])
source
GeneralLedger.setRemoveTextMethod

setRemoveText(str) Call this function to find and remove word in text string

First parameter is the complete string text, next parameter is list of words to be removed.

Example

julia> wd = setRemoveText("Amit Shukla Shkla Los Angel Angeles", ["Shkla", "Angel"])
source
GeneralLedger.setRemoveTokensMethod

setRemoveTokens(str) Call this function to remove tokens

Example

julia> wd = setRemoveTokens("Amit Shukla Shkla Los Angel Angeles")
source
GeneralLedger.setReplaceTextMethod

setReplaceText(str) Call this function to find and replace word in text string

First parameter is the term to be replaced, next is the term replaced with followed by text string where text is searched and replaced.

Example

julia> wd = setReplaceText("Shkla","Shukla ","Amit Shkla Los Angeles")
source
GeneralLedger.SubRecordINSMetadataMethod

SubRecordINSMetadata(source::String, createDTTM::String, updateDTTM::String, author::String)

Stores metadata about records/document loaded into data lake. ...

Arguments

  • source::String: stores SOR (system of record i.e. source) name
  • createDTTM::String : data & time when data was first loaded.
  • updateDTTM::String : data & time when data was last updated (equals to createDTTM in case of no updates).
  • author::String : name of the person responsible for loading data.

...

source
GeneralLedger.SubRecordUPDMetadataMethod

SubRecordUPDMetadata(source::String, updateDTTM::String, author::String)

Stores metadata about records/document loaded into data lake. ...

Arguments

  • source::String: stores SOR (system of record i.e. source) name
  • updateDTTM::String : data & time when data was last updated (equals to createDTTM in case of no updates).
  • author::String : name of the person responsible for loading data.

...

source