Design Pattern
Now since we know, what objective of this project is, let's analyze what we already know.
We will first learn few design patterns, using examples from Theme parks like Disney, Universal or Magic mountain ride parks, who manage crowd gathering so well.
for example, take a section of map from Disney or universal theme parks.
Disney Themepark
Universal Studio Themepark
Pattern
Let's first analyze few basic characteristics of theme parks.
To start with, assume all of these theme parks tickets are sold in advance or bought on same day and most of the times, tickets are sold less than maximum occupancy as per capacity allowed.
remember, crowd gathering less than maximum allowed occupancy, may not be the case in other type of crowd gatherings, such as protests, political rally or festive gatherings etc.. We will address this later in vision IOT section, where it become an important factor to detect anomaly.
Create Graph, Vertices and Edges (relationships)
Let's break each ride in park by it entity and characteristics (i.e. attributes).
Gathering Visitor, Food Supply and other data
create and load visitor information register and other data
IOTs climate data
gather IOT (Internet of things) data from sensors
Analyzing patterns
Create Graph, Vertices and Edges
# we are using Julia Language for Graph analysis
# TigerGraph provide RESTAPI end points, GSQL and GRAPHSTUDIO to connect TIGERGRAPH
#######################################################################
# pyTigerGraph is a Python based library to connect with GRAPH database and run GSQLs
# we will use Julia PyCall package to connect with pyTigerGraph library
#######################################################################
## **perhaps, some day I will re-write pyTigerGraph package in Julia ##
#######################################################################
# open Julia REPL, Jupyter or your favorite Julia IDE, run following
# first import all packages required to support our data analysis
# rest of this chapter assume that below packages are imported once
import Pkg
Pkg.add("DataFrames")
Pkg.add("CSV")
Pkg.add("PyCall")
Pkg.build("PyCall");
# you will also need to install pyTigerGraph in your python environment
# !pip install -U pyTigerGraph
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Resolving package versions...
No Changes to `~/.julia/environments/v1.7/Project.toml`
No Changes to `~/.julia/environments/v1.7/Manifest.toml`
Building Conda ─→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/6e47d11ea2776bc5627421d59cdcc1296c058071/build.log`
Building PyCall → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/1fc929f47d7c151c839c5fc1375929766fb8edcc/build.log`
before proceeding any further, please setup Tiger Graph Server instance at tgcloud.io please don't expect these credentials to work for you, as there is cost involved to keep this.
hostName = "https://p2p.i.tgcloud.io"
userName = "tigercloud"
password = "tigercloud"
graphName = "HazardAhead"
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName)
now once you have TigerGraph and Julia environments setup, let's jump on to setup sample graph, vertices and edges to get a hang of tools.
import Pkg
# you may not need to add conda, pytigergraph
# if you already have python setup
# these instructions are specific for julia setup
Pkg.add("Conda")
ENV["PYTHON"] = "/usr/bin/python3"
using PyCall
using Conda
Conda.pip_interop(true;)
# Conda.pip_interop(true; [env::Environment="/usr/bin/python3"])
Conda.pip("install", "pyTigerGraph")
Conda.add("pyTigerGraph")
tg = pyimport("pyTigerGraph")
# please don't expect below credentials to work for you, and signup at tgcloud
hostName = "https://p2p.i.tgcloud.io"
userName = "amit"
password = "password"
graphName = "HazardAhead"
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName)
# conn.gsql(getSchema)
PyObject <pyTigerGraph.pyTigerGraph.TigerGraphConnection object at 0x7f9fac7796d0>
Operations that DO NOT need a Token
Viewing the schema of your graph using functions such as getSchema and getVertexTypes does not require you to have an authentication token. A token is also not required to run gsql commands through pyTigerGraph.
Sample Connection
conn = tg.TigerGraphConnection(host='https://pytigergraph-demo.i.tgcloud.io', username='tigergraph' password='password' graphname='DemoGraph')
Operations that DO need a Token
A token is required to view or modify any actual DATA in the graph. Examples are: upserting data, deleting edges, and getting stats about any loaded vertices. A token is also required to get version data about the TigerGraph instance.
Sample Connection
conn = tg.TigerGraphConnection(host='https://pytigergraph-demo.i.tgcloud.io', username='tigergraph' password='password' graphname='DemoGraph', apiToken='av1im8nd2v06clbnb424jj7fp09hp049')
Below code is directly executed over Python environment
first you will also need to install pyTigerGraph in your python environment,
!pip install -U pyTigerGraph
then execute following commands to create TGCloud Graph
import pyTigerGraph as tg
hostName = "https://p2p.i.tgcloud.io"
userName = "amit"
password = "password"
graphName = "HazardAhead"
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName)
conn.gsql("ls")
conn.gsql('''USE GLOBAL
DROP ALL
''')
conn.gsql('''
USE GLOBAL
CREATE VERTEX Guest (PRIMARY_ID id INT, bookDate DATETIME, name STRING, phoneNo INT, age INT, gender STRING, checkIn DATETIME, checkOut DATETIME, specialNeeds BOOL, race STRING, price STRING, accompanies INT, family BOOL, localResident BOOL, ADDRESS STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX Ride (PRIMARY_ID id INT, name STRING, indoor BOOL, inlets INT, outlets INT, temperature INT, avgWaitTime INT, popularityRating INT, rideType STRING, rideClass STRING, maturityRating STRING, numExits INT, area INT, numEmployees INT) WITH primary_id_as_attribute="true"
CREATE VERTEX FoodCourt (PRIMARY_ID id INT, name STRING, indoor BOOL, inlets INT, outlets INT, temperature INT, avgWaitTime INT, popularityRating INT, foodType STRING, numExits INT, area INT, numEmployees INT) WITH primary_id_as_attribute="true"
CREATE DIRECTED EDGE rides (From Guest, To Ride, rideTime DATETIME)
CREATE DIRECTED EDGE eats (From Guest, To FoodCourt, eatTime DATETIME)
CREATE UNDIRECTED EDGE accompanied (From Guest, To Guest)
''')
results = conn.gsql('CREATE GRAPH HazardAhead(Guest, Ride, FoodCourt, rides, eats, accompanied)')
Loading Data
conn.gsql('''
USE GLOBAL
USE GRAPH HazardAhead
CREATE LOADING JOB HazardAhead_PATH FOR GRAPH HazardAhead {
DEFINE FILENAME file1 = "sampleData/visitors.csv";
DEFINE FILENAME file2 = "sampleData/ride.csv";
DEFINE FILENAME file3 = "sampleData/foodcourt.csv";
DEFINE FILENAME file4 = "sampleData/rides.csv";
DEFINE FILENAME file5 = "sampleData/eats.csv";
DEFINE FILENAME file6 = "sampleData/accompanied.csv";
LOAD file1 TO VERTEX Visitor VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX Ride VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX FoodCourt VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX rides VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX eats VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX accompanied VALUES ($0, $1,,....) USING header="true", separator=",";
}
''')
results = conn.gsql('RUN LOADING JOB HazardAhead_PATH USING file1="sampleData/visitors.csv", "sampleData/ride.csv", ...)
Gathering Visitor, Food Supply and other data
##############################################
# let's create 1000 visitors in visit register
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSizeVisitor = 1000
visitorDF = DataFrame(
id = 1:1:sampleSizeVisitor,
bookDate = rand(Date("2020-04-01", dateformat"y-m-d"): Day(1): Date("2020-04-10", dateformat"y-m-d"), sampleSizeVisitor),
name = "Last First Name M.",
phoneNo = rand(1110000000:1:9988800000, sampleSizeVisitor),
age = rand(9:1:78, sampleSizeVisitor),
gender = rand(["Male","Female","Others","NA"], sampleSizeVisitor),
checkIn = rand(Date("2020-04-01", dateformat"y-m-d"): Day(1): Date("2020-04-10", dateformat"y-m-d"), sampleSizeVisitor),
checkOut = rand(Date("2020-04-01", dateformat"y-m-d"): Day(1): Date("2020-04-10", dateformat"y-m-d"), sampleSizeVisitor),
specialNeeds = rand([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1], sampleSizeVisitor), # biased distributions, mostly false
race = "na",
price = rand(Normal(100, 2), sampleSizeVisitor),
accompanies = rand([1,2,3,4], sampleSizeVisitor),
family = rand([0,1], sampleSizeVisitor),
localResident = rand([0,1], sampleSizeVisitor),
ADDRESS = "Not available",
)
first(visitorDF,5)
5 rows × 15 columns
id | bookDate | name | phoneNo | age | gender | checkIn | checkOut | specialNeeds | race | price | accompanies | family | localResident | ADDRESS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Date | String | Int64 | Int64 | String | Date | Date | Int64 | String | Float64 | Int64 | Int64 | Int64 | String | |
1 | 1 | 2020-04-04 | Last First Name M. | 2900446033 | 28 | Others | 2020-04-06 | 2020-04-09 | 0 | na | 100.882 | 4 | 1 | 1 | Not available |
2 | 2 | 2020-04-05 | Last First Name M. | 6309075693 | 25 | Female | 2020-04-03 | 2020-04-06 | 0 | na | 104.687 | 1 | 1 | 1 | Not available |
3 | 3 | 2020-04-02 | Last First Name M. | 7549585449 | 52 | Female | 2020-04-03 | 2020-04-10 | 0 | na | 101.423 | 1 | 1 | 1 | Not available |
4 | 4 | 2020-04-03 | Last First Name M. | 6502426069 | 53 | Male | 2020-04-08 | 2020-04-09 | 0 | na | 100.103 | 1 | 1 | 1 | Not available |
5 | 5 | 2020-04-02 | Last First Name M. | 6220180785 | 23 | Male | 2020-04-10 | 2020-04-08 | 0 | na | 96.9288 | 4 | 1 | 0 | Not available |
##############################################
# let's create 20 Rides in Park
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSize = 20
rideDF = DataFrame(
id = 1:1:sampleSize,
name = "Joy Ride",
indoor = rand([0,1], sampleSize),
inlets = rand([1,2,3,4], sampleSize),
outlets = rand([1,2,3,4], sampleSize),
temperature = rand(64:1:94, sampleSize),
avgWaitTime = rand(5:1:110, sampleSize),
popularityRating = rand(1:1:10, sampleSize),
rideType = rand(["Adult","Teen","Kids", "YoungAdult"], sampleSize),
rideClass = rand(["Luxury", "Special"], sampleSize),
maturityRating = rand(1:1:10, sampleSize),
numExits = rand([1,2,3,4], sampleSize),
area = rand(5000:5:15000, sampleSize),
numEmployees = rand(1:1:5, sampleSize)
)
first(rideDF, 5)
5 rows × 14 columns
id | name | indoor | inlets | outlets | temperature | avgWaitTime | popularityRating | rideType | rideClass | maturityRating | numExits | area | numEmployees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | String | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | String | String | Int64 | Int64 | Int64 | Int64 | |
1 | 1 | Joy Ride | 0 | 4 | 3 | 80 | 82 | 7 | Teen | Luxury | 3 | 1 | 12055 | 2 |
2 | 2 | Joy Ride | 1 | 2 | 2 | 67 | 33 | 6 | Adult | Special | 10 | 3 | 6090 | 1 |
3 | 3 | Joy Ride | 0 | 4 | 2 | 74 | 86 | 7 | Kids | Special | 8 | 2 | 9840 | 1 |
4 | 4 | Joy Ride | 0 | 4 | 3 | 80 | 97 | 10 | Adult | Special | 3 | 3 | 7320 | 1 |
5 | 5 | Joy Ride | 1 | 2 | 2 | 78 | 31 | 5 | Adult | Special | 10 | 3 | 14250 | 5 |
##############################################
# let's create 20 Food Courts in Park
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSize = 20
foodcourtDF = DataFrame(
id = 1:1:sampleSize,
name = "Joy Ride",
indoor = rand([0,1], sampleSize),
inlets = rand([1,2,3,4], sampleSize),
outlets = rand([1,2,3,4], sampleSize),
temperature = rand(64:1:94, sampleSize),
avgWaitTime = rand(5:1:110, sampleSize),
popularityRating = rand(1:1:10, sampleSize),
foodType = rand(["Fast","Formal","Snacks"], sampleSize),
numExits = rand([1,2,3,4], sampleSize),
area = rand(5000:5:15000, sampleSize),
numEmployees = rand(1:1:15, sampleSize)
)
first(foodcourtDF, 5)
5 rows × 12 columns
id | name | indoor | inlets | outlets | temperature | avgWaitTime | popularityRating | foodType | numExits | area | numEmployees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | String | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | String | Int64 | Int64 | Int64 | |
1 | 1 | Joy Ride | 0 | 2 | 2 | 92 | 55 | 2 | Formal | 1 | 7015 | 9 |
2 | 2 | Joy Ride | 0 | 2 | 1 | 91 | 24 | 1 | Formal | 4 | 8045 | 12 |
3 | 3 | Joy Ride | 0 | 1 | 1 | 85 | 107 | 5 | Fast | 1 | 10030 | 14 |
4 | 4 | Joy Ride | 0 | 2 | 3 | 76 | 64 | 2 | Formal | 2 | 10110 | 6 |
5 | 5 | Joy Ride | 0 | 4 | 4 | 76 | 8 | 10 | Fast | 4 | 13085 | 11 |
IOTs climate data
##############################################
# let's create weather data
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSize = 365
weatherDF = DataFrame(
cityid = 1:1:sampleSize,
state = rand(["LA","LA","FL"], sampleSize),
indoorTemp = rand(64:1:94, sampleSize),
outdoorTemp = rand(64:1:94, sampleSize),
wind = rand(5:1:30, sampleSize),
humidity = rand(30:1:70, sampleSize),
precipitation = rand(0:1:5, sampleSize)
)
first(weatherDF, 5)
5 rows × 7 columns
cityid | state | indoorTemp | outdoorTemp | wind | humidity | precipitation | |
---|---|---|---|---|---|---|---|
Int64 | String | Int64 | Int64 | Int64 | Int64 | Int64 | |
1 | 1 | FL | 64 | 94 | 15 | 32 | 5 |
2 | 2 | LA | 84 | 85 | 12 | 38 | 4 |
3 | 3 | FL | 66 | 76 | 8 | 50 | 4 |
4 | 4 | LA | 73 | 90 | 9 | 52 | 3 |
5 | 5 | LA | 78 | 69 | 7 | 61 | 3 |