Skip to content

disnetlab/GraphFlash

Repository files navigation

GraphFlash

GraphFlash is a graph processing framework running on serverless architectures. This is the AWS Lambda version of GraphFlash.

Install dependency

GraphFlash use Python for code generation, use below instruction to install dependency. Below scripts can be down in an alpine Docker image.

apk add --no-cache \
    build-base \
    cmake \
    git \
    curl-dev \
    openssl-dev \
    zstd-dev \
    boost-dev \
    nlohmann-json \
    hiredis-dev \
    libcurl \
    ninja \
    asio-dev

cd /root && git clone https://github.com/gflags/gflags.git && cd gflags && \
    mkdir build && cd build && \
    cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON && \
    make -j$(nproc) && make install

cd /root && git clone https://github.com/google/glog.git && cd glog \
    && git checkout v0.6.0 \
    && mkdir build && cd build && \
    cmake .. && \
    make -j$(nproc) && make install

cd /root && git clone https://github.com/google/googletest.git && cd googletest && mkdir build && cd build && \
    cmake .. && make -j$(nproc) && make install

cd /root && git clone https://github.com/redis/hiredis.git && cd hiredis && make -j$(nproc) USE_SSL=1 && make USE_SSL=1 install
cd /root && git clone https://github.com/sewenew/redis-plus-plus.git && cd redis-plus-plus && mkdir build && cd build && \
    cmake -DREDIS_PLUS_PLUS_USE_TLS=ON .. && make -j$(nproc) && make install


cd /root && git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp.git && cd aws-sdk-cpp && \
    mkdir -p build && cd build && cmake .. \
                                    -DCMAKE_BUILD_TYPE=Release \
                                    -DBUILD_ONLY="core;s3" \
                                    -DENABLE_UNITY_BUILD=ON \
                                    -DBUILD_SHARED_LIBS=ON \
                                    -DENABLE_TESTING=OFF \
                                    -DCMAKE_INSTALL_PREFIX=$HOME/aws-sdk-cpp-install && make -j$(nproc) && \
     make install

cd /root && git clone https://github.com/libcpr/cpr.git && \
                cd cpr && mkdir build && cd build && \
                cmake .. -DCPR_USE_SYSTEM_CURL=ON -DBUILD_SHARED_LIBS=OFF && \
                cmake --build . --parallel &&\
                cmake --install .

cd /root && git clone https://github.com/awslabs/aws-lambda-cpp.git && cd aws-lambda-cpp && mkdir build && cd build && \
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$HOME/lambda-install && \
    make -j$(nproc) && make install

Deploy AWS Lambda Functions

  1. Build docker images and push to AWS ECR.
docker buildx build \
--platform linux/arm64 \
--build-arg TARGET_EXECUTABLE=coordinator \
-t {REPLACE_WITH_ECR_URL}/g0-coordinator:1.0 \
--push \
--provenance=false \
--sbom=false \
--output=type=registry,oci-mediatypes=false .

docker buildx build \
--platform linux/arm64 \
--build-arg TARGET_EXECUTABLE=worker \
-t {REPLACE_WITH_ECR_URL}/g0-worker:1.0 \
--push \
--provenance=false \
--sbom=false \
--output=type=registry,oci-mediatypes=false .
  1. Deploy functions.
# example script
aws lambda create-function \
  --function-name g0-coordinator-func \
  --package-type Image \
  --code ImageUri={REPLACE_WITH_ECR_URL}/g0-coordinator:1.0 \
  --role XXX \
  --architectures arm64 \
  --region XXX

aws lambda create-function \
  --function-name g0-worker-func \
  --package-type Image \
  --code ImageUri={REPLACE_WITH_ECR_URL}/g0-worker:1.0 \
  --role XXX \
  --architectures arm64 \
  --region XXX
  1. Then create URL for both functions and add permission for them to allow access, you may also change the configuration of functions such as timeout and memory limit.

Run

Download code.

git clone --recurse-submodules -b lambda git@github.com:disnetlab/GraphFlash.git
cd GraphFlash

# Configure
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_PREFIX_PATH=$HOME/aws-sdk-cpp-install \
      -Daws-lambda-runtime_DIR=$HOME/lambda-install/lib/aws-lambda-runtime/cmake
# Build CLI tools, worker binary, and all detected plugins
cmake --build build --target partition_tiny upload download combine

The plugin shared libraries (lib*.so) are produced inside the build/ folder next to the worker binary; ship them alongside the worker executable so dlopen can locate them at runtime.

extra_args is a flat JSON object passed verbatim to the plugin. Common parameters such as activate_superstep are sent at the top level of the request and automatically wired into every algorithm.

Demo

In the example, we process an example dataset kgs from LDBC.

First, download the dataset and decompress it into current directory.

ls kgs
# output should be: 
# kgs-BFS  kgs-CDLP  kgs-LCC  kgs-PR  kgs-SSSP  kgs-WCC  kgs.e  kgs.properties  kgs.v
# partition the file into two partitions
mkdir kgs-p2
./partition_tiny \
  --dataset_name=kgs-p2 \
  --directed=false \
  --edge_file=./kgs/kgs.e \
  --vertex_file=./kgs/kgs.v \
  --vertex_num=832247 \
  --edge_num=17891698 \
  --weighted=true \
  --partition_num=2 \
  --output_directory=./kgs-p2


# run a redis server on 127.0.0.1:6379 and upload partition into MaaS
./upload \
  --input_directory=./kgs-p2 \
  --dataset_name=kgs-p2 \
  --partition_num=2 \
  --bucket=XXX \
  --access_key=XXX \
  --secret_key=XXX
  
# trigger the function
curl -X POST XXX \
  -H "Content-Type: application/json" \
  -d '{
    "algorithm": "BFS",
    "max_workers": 2,
    "partition_num": 2,
    "vertex_num": 832247,
    "run_id": "kgs-p2-BFS",
    "dataset": "kgs-p2",
    "directed": false,
    "weighted": true,
    "maas_addr": "XXX",
    "worker_url": "XXX",
    "bucket": "XXX",
    "access_key": "XXX",
    "secret_key": "XXX",
    "activate_superstep": 1,
    "thread_num": 1,
    "extra_args": {
      "source-vertex": 239044
    }
  }'
 
# `activate_superstep` is optional; omit it when the algorithm should begin activating outer vertices from the first
# incremental round (the default value is 1). Algorithm-specific settings stay in `extra_args` as a flat object.

# download results
./download --output_directory=./kgs-p2/ --partition_num=2 --run_id=kgs-p2-BFS --type=int --access_key=XXX --secret_key=XXX --bucket=XXX

# generate the final result
./combine --dataset_name=kgs-p2 --directory=./kgs-p2 --partition_num=2 --run_id=kgs-p2-BFS --type=int --vertex_num=832247

# check the result
diff kgs-p2/kgs-p2-BFS.g0r kgs/kgs-BFS

Note that ./partition remap the vertex IDs to constant integers, the vertex IDs used in request body should be the remapped IDs. The IDs can be find in {DATASET_NAME}.g0m, in the above example, it is in ./kgs-p2/kgs-p2.g0m, to find 239044, run grep -n '^239044$' kgs-p2/kgs-p2.g0m | cut -d: -f1 | awk '{print $1-1}'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors