API · GeneralLedger.jl

Manual/API

GeneralLedger.getArrangedWords — Method

getArrangedWords(str) Call this function to remove duplicates, unwanted symbols and return uppercase unique values

Example

julia> wd = getArrangedWords("Amit ; Shukla SHUKLA Shukle , . AmIT Amit # Shuklam Amit ,")

source

GeneralLedger.getCategoryData — Method

getCategoryData(df_dname:: DataFrame, colName:: AbstractString, categoryRange) Call this function to create a new column on DataFrame which provide a category based on ranges provided.

column must be contain only numerical values.

First parameter is the DataFrame, next is Column name for which categories are created, followed by Ranges.

Function returns dataframe with an extra columns with Categories like 1,2,3 and 0 for unmatched.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> catData = getCategoryData(df_dname, "age", [0:10,10:20,20:30])

source

GeneralLedger.getDBConnection — Method

getDBConnection(credFilePath:: AbstractString)

Call this function to read database credentials

database credentials are stored in environment.txt file in following format.

environment.txt

user=username
pwd=password
dsn=userdsnname
hive=hdinsightstr
port=portnumber

Example

julia> fl = getDBConnection("environment.txt")

source

GeneralLedger.getDSNs — Method

getDSNs()

Call this function to read computing machine available DSNs

Example

julia> dsns = getDSNs()

source

GeneralLedger.getDrivers — Method

getDrivers()

Call this function to read computing machine available Drivers

Example

julia> drivers = getDrivers()

source

GeneralLedger.getDuplicateRows — Method

getDuplicateRows(df_dname:: DataFrame, colNames:: Vector) Call this function to find duplicates in a data frame column based on columnnames (key columns)

First parameter is the DataFrame, followed by all columnnames in DataFrame which are key columns in a dataframe.

Function returns dataframe row indexes which are duplicates based on key columns provided.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> dup = getDuplicateRows(df_dname, ["name","age"])

source

GeneralLedger.getFile — Method

getFile(url::AbstractString, downloadPath:: AbstractString)

Call this function to download a file

first parameter is webpage url, next parameter is output directory path including file name

Example

julia> fl = getFile("https://s2.q4cdn.com/470004039/files/doc_financials/2021/q2/FY21-Q2-Consolidated-Financial-Statements.pdf", "<folder_name>/test.csv")

source

GeneralLedger.getFuzzyWuzzy — Method

getFuzzyWuzzy(str) Call this function to find closest match for a given string in data frame column lookup

First parameter is the search string, second is the DataFrame followed by columnname in DataFrame which needs to be searched

Example

julia> df_dname = DataFrame(name=["John Doe", "Jen Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,26,35,10,5,45])
julia> wd = getFuzzyWuzzy("Mike Jackson", df_dname, "name")

source

GeneralLedger.getJSONintoDataFrame — Method

getJSONintoDataFrame(path::AbstractString)

Call this function to read json file from url and retrieve results into DataFrame

first parameter is webpage url, next parameter is output directory path

Example

julia> df = getJSONintoDataFrame("https://api.coindesk.com/v1/bpi/currentprice.json", "downloads/web")

source

GeneralLedger.getKeyColumns — Method

getKeyColumns(df_dname:: DataFrame) Call this function to find key columns in a data frame

First parameter is the DataFrame

Function returns dataframe columns indexes which are key columns.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> kcols = getKeyColumns(df_dname)

source

GeneralLedger.getMaskedData — Method

getMaskedData(df_dname:: DataFrame, colNames:: Vector) Call this function to create a flatten tree data structure.

First parameter is the DataFrame, next is Column names which needs to be masked

Function returns a two dataframes, one with masked data and other with masked + original data.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> catData = getMaskedData(df_dname, "name")

source

GeneralLedger.getPullFiles — Method

getPullFiles(path::AbstractString)

Call this function to read urls in local txt files line by line, and download each file

Example

julia> fl = getPullFiles("c:mit.lailename.txt")

source

GeneralLedger.getSQLs — Method

getSQLs(credFilePath:: AbstractString)

Call this function to read sqls for a given table from txt file. SQLs are stored in txt file in this format.

sqls.txt

createTable1=INSERT INTO table1 (column1) SELECT table2.column1 FROM table2 WHERE table2.column1 > 100;
readTable1=SELECT * FROM table1;
updateTable1=UPDATE table SET column1 = value1, column2 = value2, ... WHERE condition;
upsertTable1=BEDIN tran IF EXISTS (SELECT * FROM table1 WITH (updlock,serializable) WHERE key = @key) BEGIN UDPATE table1 SET ... WHERE key = @key END ELSE BEGIN INSERT INTO table1 (key, ...) VALUES (@key, ...) END COMMIT TRAN
softDeleteTable1=UPDATE table SET deleted=True, ... WHERE condition;
hardDeleteTable1=delete * from table1 where table1.column1 > 100

Example

julia> fl = getSQLs("environment.txt")

source

GeneralLedger.getSampleBDeposit — Method

getSampleBDeposit(d:: Deposit)

Call this function to calculate simple yearly interest.
    principal::T # =10000 # Principal amount
    rate::T # =3.875  Rate of Interest
    compound::T # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
    time::T # number of deposit months

function getSampleBDeposit(d) # buddy deposit
    P = d.principal
    n = d.compound # simple interest calculated per year
    r = d.rate/100;
    t = d.time/12;
    return P * (1 + (r * t) / n) - P, P * (1 + (r * t) / n) # interest, total
end

source

GeneralLedger.getSampleCDeposit — Method

getSampleCDeposit(d:: Deposit)
Call this function to calculate accumulated compound interest.

P::Int64=10000 # Principal amount
r::Float64=3.875 # Rate of Interest
n::Int64=1, # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
t::Int64=60 # number of deposit months

function getSampleCDeposit(d::Deposit) # certificate deposit
    P = d.principal
    n = d.compound
    r = d.rate / 100
    t = d.time / 12
    return P * (1 + r / n)^(n * t) - P, P * (1 + r / n)^(n * t)# interest, total
end

source

GeneralLedger.getSampleDataTimeTaken — Function

getSampleDataTimeTaken(sampleSize::Int64=10)

Call this function to produce sample datasets showing Timetaken by vehicles.

Time taken to travel between two cities via Air, Bus, Train or personal vehicle depends mostly depends on speed and distance. However, there are other factors like weather, season, population or faults, which may occasionally impact travel time. However, knowing intensity of these factors, it's still possible to predict time taken rationally.

There are many assumptions when calculating Time taken, like, Passenger needs to travel to bus, train station or time to get ready, wait time etc. is not reflected here. Let's capture these assumptions as in column "bias" for now.

source

GeneralLedger.getSampleDepositsData — Function

getSampleDepositsData(sampleSize::Int64=10, P::Int64=10000, r::Float64=3.875, n::Int=1, t::Int=60)

Call this function to produce sample datasets showing deposits.
please see, this function generate only sample data with bias (intentionally biased and wrong, so that a pattern can be analyzed)
and must not be used for any real calculations.

sampleSize::Int64=10 # number of rows generated
P::Int64=10000 # Principal amount
r::Float64=3.875 # Rate of Interest
n::Int=1, # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
t::Int=60 # number of deposit months

source

GeneralLedger.getSampleRDeposit — Method

getSampleRDeposit(d:: Deposit)
Call this function to calculate Random Interest return.

P::Int=10000 # Principal amount
r::Float64=3.875 # Rate of Interest
n::Int=1, # compound frequency - Daily=365, Monthly=12, Qtr=4, Annually=1
t::Int=60 # number of deposit months

function getSampleRDeposit(d::Deposit) # random deposit
    P = d.principal
    x = rand(-0.51:0.02:0.99)
    return P * (1 + x) - P, P * (1 + x) # interest, total
end

source

GeneralLedger.getTokens — Method

getTokens(s, token_type) Call this function to extract tokens from string (example - extract urls)

First parameter is the complete string text, next parameter is list of words to be removed.

Example

julia> wd = getTokens("https://yahoo.com is the Yahoo website url", url)

source

GeneralLedger.getTreeData — Method

getTreeData(df_dname:: DataFrame, colName:: AbstractString) Call this function to create a flatten tree data structure.

First parameter is the DataFrame, next is Column name for which tree hierachies (flattened) are created.

Function returns a new dataframe with Levels.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> catData = getTreeData(df_dname, "state")

source

GeneralLedger.getWebLinks — Method

getWebLinks(url::AbstractString, fileTypes::AbstractVector, downloadPath:: AbstractString)

Call this function to crawl through a web page and download all file links

first parameter is webpage url, next parameter is list of all file extensions user wish to download followed by output directory path

Example

julia> fl = getWebLinks("https://investor.apple.com/investor-relations/default.aspx#tabs_content--2021", ["pdf","csv","xlsx","xls"], "downloads/web")

source

GeneralLedger.getXLSinDirectory — Method

getXLSinDirectory(str)

Call this function to read all xls files inside directory

Example

julia> files = getXLSinDirectory("dirname")

source

GeneralLedger.getXMLintoDataFrame — Method

getXMLintoDataFrame(path::AbstractString)

Call this function to read XML file from url and retrieve results into DataFrame

first parameter is webpage url, next parameter is output directory path

Example

julia> df = getXMLintoDataFrame("https://api.coindesk.com/v1/bpi/currentprice.json", "downloads/web")

source

GeneralLedger.runSQL — Method

runSQL(conn:: AbstractString, sql:: AbstractString)

Call this function to run sql in database

Example

julia> res = runSQL(conn, SQLStatement)

source

GeneralLedger.setCloseConnection — Method

setCloseConnection()

Call this function to close database connection

Example

julia> cls = setCloseConnection()

source

GeneralLedger.setColNames — Method

setColNames(str)

Call this function to update column names

remove blank spaces, dollar sign, or Hash chars

make all columns uppercase and replace hyphen with underscore

source

GeneralLedger.setRemDuplicateRows — Method

setRemDuplicateRows(df_dname:: DataFrame, colNames:: Vector) Call this function to find & delete duplicates in a data frame column based on columnnames (key columns)

First parameter is the DataFrame, followed by all columnnames in DataFrame which are key columns in a dataframe.

Function returns dataframe after removing duplicates based on key columns provided. in case of duplicates, it retains first row.

Example

julia> df_dname = DataFrame(name=["John Doe", "John Doe","MICHAEL Doe", "Jacob Doe", "Julia Dpe", "Michael Jackson"],age=[35,35,35,10,5,45], state=["CA","CA","CA","CA","CO","CA"])
julia> dup = setRemDuplicateRows(df_dname, ["name","age"])

source

GeneralLedger.setRemoveText — Method

setRemoveText(str) Call this function to find and remove word in text string

First parameter is the complete string text, next parameter is list of words to be removed.

Example

julia> wd = setRemoveText("Amit Shukla Shkla Los Angel Angeles", ["Shkla", "Angel"])

source

GeneralLedger.setRemoveTokens — Method

setRemoveTokens(str) Call this function to remove tokens

Example

julia> wd = setRemoveTokens("Amit Shukla Shkla Los Angel Angeles")

source

GeneralLedger.setReplaceText — Method

setReplaceText(str) Call this function to find and replace word in text string

First parameter is the term to be replaced, next is the term replaced with followed by text string where text is searched and replaced.

Example

julia> wd = setReplaceText("Shkla","Shukla ","Amit Shkla Los Angeles")

source

GeneralLedger.Deposit — Type

Deposit{T}
defines a data structure for deposit calculations

source

GeneralLedger.SubRecordINSMetadata — Method

SubRecordINSMetadata(source::String, createDTTM::String, updateDTTM::String, author::String)

Stores metadata about records/document loaded into data lake. ...

Arguments

source::String: stores SOR (system of record i.e. source) name
createDTTM::String : data & time when data was first loaded.
updateDTTM::String : data & time when data was last updated (equals to createDTTM in case of no updates).
author::String : name of the person responsible for loading data.

...

source

GeneralLedger.SubRecordUPDMetadata — Method

SubRecordUPDMetadata(source::String, updateDTTM::String, author::String)

Stores metadata about records/document loaded into data lake. ...

Arguments

source::String: stores SOR (system of record i.e. source) name
updateDTTM::String : data & time when data was last updated (equals to createDTTM in case of no updates).
author::String : name of the person responsible for loading data.

...

source