Design Pattern

Now since we know, what objective of this project is, let's analyze what we already know.

We will first learn few design patterns, using examples from Theme parks like Disney, Universal or Magic mountain ride parks, who manage crowd gathering so well.

for example, take a section of map from Disney or universal theme parks.

Disney Themepark

Universal Studio Themepark

Pattern

Let's first analyze few basic characteristics of theme parks.

To start with, assume all of these theme parks tickets are sold in advance or bought on same day and most of the times, tickets are sold less than maximum occupancy as per capacity allowed.

remember, crowd gathering less than maximum allowed occupancy, may not be the case in other type of crowd gatherings, such as protests, political rally or festive gatherings etc.. We will address this later in vision IOT section, where it become an important factor to detect anomaly.

Create Graph, Vertices and Edges (relationships)
Let's break each ride in park by it entity and characteristics (i.e. attributes).
Gathering Visitor, Food Supply and other data
create and load visitor information register and other data
IOTs climate data
gather IOT (Internet of things) data from sensors
Analyzing patterns

Create Graph, Vertices and Edges

# we are using Julia Language for Graph analysis
# TigerGraph provide RESTAPI end points, GSQL and GRAPHSTUDIO to connect TIGERGRAPH
#######################################################################
# pyTigerGraph is a Python based library to connect with GRAPH database and run GSQLs
# we will use Julia PyCall package to connect with pyTigerGraph library
#######################################################################
## **perhaps, some day I will re-write pyTigerGraph package in Julia ##
#######################################################################

# open Julia REPL, Jupyter or your favorite Julia IDE, run following

# first import all packages required to support our data analysis
# rest of this chapter assume that below packages are imported once
import Pkg
Pkg.add("DataFrames")
Pkg.add("CSV")
Pkg.add("PyCall")
Pkg.build("PyCall");

# you will also need to install pyTigerGraph in your python environment
# !pip install -U pyTigerGraph

    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.7/Project.toml`
  No Changes to `~/.julia/environments/v1.7/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.7/Project.toml`
  No Changes to `~/.julia/environments/v1.7/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.7/Project.toml`
  No Changes to `~/.julia/environments/v1.7/Manifest.toml`
    Building Conda ─→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/6e47d11ea2776bc5627421d59cdcc1296c058071/build.log`
    Building PyCall → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/1fc929f47d7c151c839c5fc1375929766fb8edcc/build.log`

Info

before proceeding any further, please setup Tiger Graph Server instance at tgcloud.io please don't expect these credentials to work for you, as there is cost involved to keep this.

hostName = "https://p2p.i.tgcloud.io"

userName = "tigercloud"

password = "tigercloud"

graphName = "HazardAhead"

conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName)

now once you have TigerGraph and Julia environments setup, let's jump on to setup sample graph, vertices and edges to get a hang of tools.

import Pkg
# you may not need to add conda, pytigergraph
# if you already have python setup
# these instructions are specific for julia setup
Pkg.add("Conda")
ENV["PYTHON"] = "/usr/bin/python3"
using PyCall
using Conda
Conda.pip_interop(true;)
# Conda.pip_interop(true; [env::Environment="/usr/bin/python3"])
Conda.pip("install", "pyTigerGraph")
Conda.add("pyTigerGraph")
tg = pyimport("pyTigerGraph")
# please don't expect below credentials to work for you, and signup at tgcloud
hostName = "https://p2p.i.tgcloud.io"
userName = "amit"
password = "password"
graphName = "HazardAhead"
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName)
# conn.gsql(getSchema)

PyObject <pyTigerGraph.pyTigerGraph.TigerGraphConnection object at 0x7f9fac7796d0>

Warning

Operations that DO NOT need a Token

Viewing the schema of your graph using functions such as getSchema and getVertexTypes does not require you to have an authentication token. A token is also not required to run gsql commands through pyTigerGraph.

Sample Connection

conn = tg.TigerGraphConnection(host='https://pytigergraph-demo.i.tgcloud.io', username='tigergraph' password='password' graphname='DemoGraph')

Operations that DO need a Token

A token is required to view or modify any actual DATA in the graph. Examples are: upserting data, deleting edges, and getting stats about any loaded vertices. A token is also required to get version data about the TigerGraph instance.

Sample Connection

conn = tg.TigerGraphConnection(host='https://pytigergraph-demo.i.tgcloud.io', username='tigergraph' password='password' graphname='DemoGraph', apiToken='av1im8nd2v06clbnb424jj7fp09hp049')

Note

Below code is directly executed over Python environment

first you will also need to install pyTigerGraph in your python environment,

!pip install -U pyTigerGraph

then execute following commands to create TGCloud Graph

import pyTigerGraph as tg
hostName = "https://p2p.i.tgcloud.io"
userName = "amit"
password = "password"
graphName = "HazardAhead"
conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password, graphname=graphName)

conn.gsql("ls")
conn.gsql('''USE GLOBAL
DROP ALL
''')

conn.gsql('''
  USE GLOBAL
  CREATE VERTEX Guest (PRIMARY_ID id INT, bookDate DATETIME, name STRING, phoneNo INT, age INT, gender STRING, checkIn DATETIME, checkOut DATETIME, specialNeeds BOOL, race STRING, price STRING, accompanies INT, family BOOL, localResident BOOL, ADDRESS STRING) WITH primary_id_as_attribute="true"

  CREATE VERTEX Ride (PRIMARY_ID id INT, name STRING, indoor BOOL, inlets INT, outlets INT, temperature INT, avgWaitTime INT, popularityRating INT, rideType STRING, rideClass STRING, maturityRating STRING, numExits INT, area INT, numEmployees INT) WITH primary_id_as_attribute="true"

  CREATE VERTEX FoodCourt (PRIMARY_ID id INT, name STRING, indoor BOOL, inlets INT, outlets INT, temperature INT, avgWaitTime INT, popularityRating INT, foodType STRING, numExits INT, area INT, numEmployees INT) WITH primary_id_as_attribute="true"

  CREATE DIRECTED EDGE rides (From Guest, To Ride, rideTime DATETIME)
  CREATE DIRECTED EDGE eats (From Guest, To FoodCourt, eatTime DATETIME)
  CREATE UNDIRECTED EDGE accompanied (From Guest, To Guest)

''')
results = conn.gsql('CREATE GRAPH HazardAhead(Guest, Ride, FoodCourt, rides, eats, accompanied)')

Graph 1

Loading Data

conn.gsql('''
USE GLOBAL
USE GRAPH HazardAhead
CREATE LOADING JOB HazardAhead_PATH FOR GRAPH HazardAhead {
DEFINE FILENAME file1 = "sampleData/visitors.csv";
DEFINE FILENAME file2 = "sampleData/ride.csv";
DEFINE FILENAME file3 = "sampleData/foodcourt.csv";
DEFINE FILENAME file4 = "sampleData/rides.csv";
DEFINE FILENAME file5 = "sampleData/eats.csv";
DEFINE FILENAME file6 = "sampleData/accompanied.csv";
LOAD file1 TO VERTEX Visitor VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX Ride VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX FoodCourt VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX rides VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX eats VALUES ($0, $1,,....) USING header="true", separator=",";
LOAD file1 TO VERTEX accompanied VALUES ($0, $1,,....) USING header="true", separator=",";
}
''')

results = conn.gsql('RUN LOADING JOB HazardAhead_PATH USING file1="sampleData/visitors.csv", "sampleData/ride.csv", ...)

Graph 2

Graph 3

Graph 4

Gathering Visitor, Food Supply and other data

##############################################
# let's create 1000 visitors in visit register
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSizeVisitor = 1000
visitorDF = DataFrame(
    id = 1:1:sampleSizeVisitor,
    bookDate = rand(Date("2020-04-01", dateformat"y-m-d"): Day(1): Date("2020-04-10", dateformat"y-m-d"), sampleSizeVisitor),
    name = "Last First Name M.",
    phoneNo = rand(1110000000:1:9988800000, sampleSizeVisitor),
    age = rand(9:1:78, sampleSizeVisitor),
    gender = rand(["Male","Female","Others","NA"], sampleSizeVisitor),
    checkIn = rand(Date("2020-04-01", dateformat"y-m-d"): Day(1): Date("2020-04-10", dateformat"y-m-d"), sampleSizeVisitor),
    checkOut = rand(Date("2020-04-01", dateformat"y-m-d"): Day(1): Date("2020-04-10", dateformat"y-m-d"), sampleSizeVisitor),
    specialNeeds = rand([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1], sampleSizeVisitor), # biased distributions, mostly false
    race = "na",
    price = rand(Normal(100, 2), sampleSizeVisitor),
    accompanies = rand([1,2,3,4], sampleSizeVisitor),
    family = rand([0,1], sampleSizeVisitor),
    localResident = rand([0,1], sampleSizeVisitor),
    ADDRESS = "Not available",
    )

first(visitorDF,5)

5 rows × 15 columns

	id	bookDate	name	phoneNo	age	gender	checkIn	checkOut	specialNeeds	race	price	accompanies	family	localResident	ADDRESS
	Int64	Date	String	Int64	Int64	String	Date	Date	Int64	String	Float64	Int64	Int64	Int64	String
1	1	2020-04-04	Last First Name M.	2900446033	28	Others	2020-04-06	2020-04-09	0	na	100.882	4	1	1	Not available
2	2	2020-04-05	Last First Name M.	6309075693	25	Female	2020-04-03	2020-04-06	0	na	104.687	1	1	1	Not available
3	3	2020-04-02	Last First Name M.	7549585449	52	Female	2020-04-03	2020-04-10	0	na	101.423	1	1	1	Not available
4	4	2020-04-03	Last First Name M.	6502426069	53	Male	2020-04-08	2020-04-09	0	na	100.103	1	1	1	Not available
5	5	2020-04-02	Last First Name M.	6220180785	23	Male	2020-04-10	2020-04-08	0	na	96.9288	4	1	0	Not available

##############################################
# let's create 20 Rides in Park
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSize = 20
rideDF = DataFrame(
    id = 1:1:sampleSize,
    name = "Joy Ride",
    indoor = rand([0,1], sampleSize),
    inlets = rand([1,2,3,4], sampleSize),
    outlets = rand([1,2,3,4], sampleSize),
    temperature = rand(64:1:94, sampleSize),
    avgWaitTime = rand(5:1:110, sampleSize),
    popularityRating = rand(1:1:10, sampleSize),
    rideType = rand(["Adult","Teen","Kids", "YoungAdult"], sampleSize),
    rideClass = rand(["Luxury", "Special"], sampleSize),
    maturityRating = rand(1:1:10, sampleSize),
    numExits = rand([1,2,3,4], sampleSize),
    area = rand(5000:5:15000, sampleSize),
    numEmployees = rand(1:1:5, sampleSize)
    )

first(rideDF, 5)

5 rows × 14 columns

	id	name	indoor	inlets	outlets	temperature	avgWaitTime	popularityRating	rideType	rideClass	maturityRating	numExits	area	numEmployees
	Int64	String	Int64	Int64	Int64	Int64	Int64	Int64	String	String	Int64	Int64	Int64	Int64
1	1	Joy Ride	0	4	3	80	82	7	Teen	Luxury	3	1	12055	2
2	2	Joy Ride	1	2	2	67	33	6	Adult	Special	10	3	6090	1
3	3	Joy Ride	0	4	2	74	86	7	Kids	Special	8	2	9840	1
4	4	Joy Ride	0	4	3	80	97	10	Adult	Special	3	3	7320	1
5	5	Joy Ride	1	2	2	78	31	5	Adult	Special	10	3	14250	5

##############################################
# let's create 20 Food Courts in Park
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSize = 20
foodcourtDF = DataFrame(
    id = 1:1:sampleSize,
    name = "Joy Ride",
    indoor = rand([0,1], sampleSize),
    inlets = rand([1,2,3,4], sampleSize),
    outlets = rand([1,2,3,4], sampleSize),
    temperature = rand(64:1:94, sampleSize),
    avgWaitTime = rand(5:1:110, sampleSize),
    popularityRating = rand(1:1:10, sampleSize),
    foodType = rand(["Fast","Formal","Snacks"], sampleSize),
    numExits = rand([1,2,3,4], sampleSize),
    area = rand(5000:5:15000, sampleSize),
    numEmployees = rand(1:1:15, sampleSize)
    )

first(foodcourtDF, 5)

5 rows × 12 columns

	id	name	indoor	inlets	outlets	temperature	avgWaitTime	popularityRating	foodType	numExits	area	numEmployees
	Int64	String	Int64	Int64	Int64	Int64	Int64	Int64	String	Int64	Int64	Int64
1	1	Joy Ride	0	2	2	92	55	2	Formal	1	7015	9
2	2	Joy Ride	0	2	1	91	24	1	Formal	4	8045	12
3	3	Joy Ride	0	1	1	85	107	5	Fast	1	10030	14
4	4	Joy Ride	0	2	3	76	64	2	Formal	2	10110	6
5	5	Joy Ride	0	4	4	76	8	10	Fast	4	13085	11

IOTs climate data

##############################################
# let's create weather data
##############################################
using DataFrames, CSV, Dates, Distributions
sampleSize = 365
weatherDF = DataFrame(
    cityid = 1:1:sampleSize,
    state = rand(["LA","LA","FL"], sampleSize),
    indoorTemp = rand(64:1:94, sampleSize),
    outdoorTemp = rand(64:1:94, sampleSize),
    wind = rand(5:1:30, sampleSize),
    humidity = rand(30:1:70, sampleSize),
    precipitation = rand(0:1:5, sampleSize)
    )

first(weatherDF, 5)

5 rows × 7 columns

	cityid	state	indoorTemp	outdoorTemp	wind	humidity	precipitation
	Int64	String	Int64	Int64	Int64	Int64	Int64
1	1	FL	64	94	15	32	5
2	2	LA	84	85	12	38	4
3	3	FL	66	76	8	50	4
4	4	LA	73	90	9	52	3
5	5	LA	78	69	7	61	3

Analyzing patterns