FC

FC

FC


Set of flashcards Details

Flashcards 72
Language English
Category Computer Science
Level University
Created / Updated 28.11.2020 / 17.07.2021
Weblink
https://card2brain.ch/box/20201128_dbt
Embed
<iframe src="https://card2brain.ch/box/20201128_dbt/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

What are Quorums?

  • Takes the middle ground of synchronous or asynchronous updates

  • Updates are propagated asynchronous, but they do not commit until a majority of replicas has acknowledged the updates

  • Reads cannot longer contact a single replica to avoid stale reads - quorum sizes must be set in a way to preclude concurrent updates and to assert intersection of reading and write quorums

Quorums: For N replicas and read/write quorum sizes R/W:

No stale reads?

No concurrent updates?

  • No stale reads: R+W > N

  • No concurrent updates: W > N/2

  • If these conditions are violated: sloppy quorum

What are the four replica placement strategies?

  • Global mapping

  • Hashing

  • Chaining

  • Scattering

Replica placement strategies / Global mapping:

Pro/Contra

Examples

  • Storage systems control replica placement in a single centralized component

  • Pro

    • Supports arbitrary complex and intelligent replica placement decisions

  • Contra

    • Comes with natural scalability and availability challenges

      • Single point of failure

      • All control flow needs to pass through a centralized component

  • Examples

    • GFS

      • Single master, entire placement and selection in the cluster

      • Shadow master servers to improve availability

    • Nebula

      • Grid-inspired distributed edge store

      • Centrally controlled placement in the DataStore master

Replica placement strategies / Hashing:

Pro/Contra

Examples

  • Hash-value (usually of the data item’s key) is used to deterministically identify a set of machines which will then store the data item

  • Pro

    • Scales very well as replica placement and selection are decentralized

  • Contra

    • Does not cope well with high node churn rates

    • Not a good fit for fog deployments as the full determinism of the static hash function

      • Makes it hard to consider the underlying network topologies in replica placement

      • Does not allow to place data close to actual access location based on current demand

  • Example

    • Chord

      • Assign nodes and data-items an m-bit ID

      • IDs are attained as a circular modulo 2^m

      • Data item is stored on a node whose ID is greeted or equal to its own ID

      • Each node hold a pointer to its pre- and successor which are used to lookup data

    • PAST

    • Kademlia

    • Dynamo

    • Cassandra

Replica placement strategies / Chaining:

Pro/Contra

Examples

  • Additional replicas are created (deterministically) on adjacent machines of a primary replica select through some other replica placement strategy

  • Pro

    • Makes it possible to control where chaining replicas should reside

  • Contra

    • Trends to clutter replicas on close physical proximity

    • Is relatively static, so not equipped well for dynamic replica movement

  • Example

    • Dynamo

      • Primary replica is selected through consistent hashing

      • Additional replicas are placed on the next N-1 nodes on the ring as defined by the replication factor

      • If a temporary node failure occurs -> a slightly relaxed version of consistent hashing with chaining

        • First N healthy nodes become replicas starting from the key range identified for the primary replica until the first N nodes are available again

        • Hinted handoffs

    • Cassandra

      • A feature called snitches prevents storing replicas on machines in the same rack or in the same datacenter

Replica placement strategies / Scattering:

Pro/Contra

Examples

  • Creates a pseudorandomized but deterministic distribution of replicas across machines

  • Pro

    • Can be used for good placement in geo-distributed deployments

  • Contra

    • Poorly equipped to deal with end-user mobility and resulting access pattern across system nodes

  • Example

    • CRUSH

      • Computes a pseudorandom data placement distribution based on a hierarchical description of the target cluster

What are hybrid approaches? (replica placement strategies)

  • The four strategies are not a natural fit for the fog

  • Combination of different strategies is the way to go

  • Examples

    • PNUTS

      • Global mapping for replica placement within a region and full replication across regions

    • DynamoDB

      • Offers global mapping for cross-region replication

    • FARSITE

      • Combines global mapping with scattering

    • IPFS

      • Bit-Torrent inspired protocol for data exchange and a hashing algorithm to determine storage locations

Case Studies / IPFS + RozoFS

  • IPFS is a peer to peer distributed file system

  • Object are content-addressable

  • Merkle DAG

  • Advantages

    • Content-based Addressing

    • Tamper-proof

      • Content is verified with a checksum

    • No duplication

      • Object with the same content have the same ID

  • Technologies

    • DHT

    • Block Exchange - Bit Torrent

    • Version Control SyStems - Git

    • Self-Certifying Filesystem

  • Not fog-ready: because of the slow DHT

    • Use a scale-out NAS (RozoFS) to enable site reads without using the DHT

Case Studies / Global Data Plane

  • Data-centric abstraction focused on the distribution, preservation, and protection of information

  • Builds upon append-only, single-writer logs

    • Lightweight and durable

    • Multiple simultaneous reads

    • No fixed location, migrated as necessary

    • Compositions are achieved by subscriptions

  • Location-independent routing

    • Large 256-bit address space

    • Packages are routed through an overlay network that uses a DHT

      • Enables flexible placement, controllable replication and simple migration of logs

    • GDP places logs within the infrastructure and advertises the location to the underlying routing layer

    • Placement and replication of logs can be optimized for latency, QoS. privacy and durability

    • Logs themselves are split into chunks whose placement can be optimized for durability and performance

Case Studies / FBase (FReD)

  • Application controlled replica placement

  • Key abstractions

    • Nodes

      • Group of one or more machines within one geographical site

      • Including a hosed or embedded storage system

      • Nodes only interact with other nodes as a whole (not with individual machines)

      • Coordination within nodes is done through the storage system

    • Keygroups

      • Group of data items that are replicated together

      • Own ACL

      • Applications declarative specify the set of key group members which controls data distribution

    • Keygroup members

      • One or both roles:

      • Replica nodes

        • Store a data replica, serve client requests, and manage keygroup configuration

      • Trigger nodes

        • Receive all updates as a stream of events and may trigger external systems via an event-based interface

      • Applications can specify a TTL for data retention on replica nodes

Why is FaaS promising for the edge?

  • Higher utilization of sacred edge resources

  • Stateless functions can be moved as needed

  • Event-driven is a good fit for many edge/fog applications

  • Flexibility

Platforms for the Edge / LeanOpenWhisk

  • OpenWhisk to heavy for Fog

  • Replaced the heaviest components

  • If sully compatible with OpenWhisk and part of the OpenWhisk releases

  • Cons: Still a lot of unnecessary code that was written for cloud-based deployments

Platforms for the Edge / tinyFaaS

Is it sufficient for the edge?

  • Goals: lightweight, extensible, http or http compatible

  • Key mechanisms

    • Remove as many components as possible

    • CoAP as application protocol

    • Parallel execution of request within a container

      • On container per client or per function

  • Experiments

    • Compare tinyFaaS to: native node.js, Lean OpenWhisk, Kubeless

    • Infrastructure: Raspberry Pi 3 B+

    • Measure latency at different load levels with hard SLA

    • Results

      • Native node.js: very low overhead of tinyFaaS

      • Lean OW: does not work on a RaspberryPi

      • Kubeless: comparable at very low load, but does not scale

  • Suffice for the edge?

    • Pro

      • Designed for small nodes

      • Much more efficient than alternative solutions

    • Contra

      • No support for cluster-based or on-device deployment

Platforms for the Edge / NanoLambda

Is it sufficient for the edge?

  • Targeted at extremely resource-constrained devices

  • Subset of standard libraries

  • Implements AWS Lambda API

  • Builds on CSPOT

  • Lightweight python VM for bytecodes (IoTPy)

  • Remove compilation (cloud/edge): code is never delivered to device

  • Life cycle of function

    • Deploy to NanoLambda Cloud/Edge service

      • Stores code

      • Compiles and caches compact bytecode representation on-demand

    • IoT device requests bytecode from NanoLambda service

  • Suffice for the edge?

    • Pro

      • Designed for small edge nodes and on-device

      • Cloud compatibility

    • Contra

      • No support for cloud-based clusters

Testing, Benchmarking and Monitoring:

Which stage?

Which level?

Combinations?

 

  • Testing

    • Testing stage

    • Method/function level

    • Focus on functional behavior

    • + Monitoring: Live testing

    • + Benchmarking: Performance test / Microbenchmarks

  • Benchmarking

    • Testing stage

    • System level

    • Stress test

    • Focus on QoS

    • + Monitoring: Service / API benchmarking

  • Monitoring

    • Production stage

    • System level

    • Passive observation

    • Focus on QoS

What are the phases of testing?

  • Unit testing

  • Integration & Live Testing

    • Canary Testing

    • Dark Launches

    • A/B Testing

Cloud integration tests vs. Fog Integration tests

  • Cloud integration tests

    • Moch services, data, devices

    • Evaluate corner-cases which usually should not exist in production

  • Fog Integration tests

    • Much more difficult because of physical infrastructure

    • (partial) solution: virtualize & emulate fog environment in the cloud

What is live testing?

Examples?

  • Test new software version in production

  • Monitor what happens

    • While rolling out an update gradually

    • While directing part of the traffic to old and/or new version

  • Example

    • Blue/Green Deployments

      • Deploy new version to blue environment

      • Smoke tests against the blue system

      • Switch traffic from green to blue

      • Switch back to green on errors

    • Canary releasing

      • Rollout of a new version only to a subset of production servers

      • Easy to revert

      • Use it for A/B testing

      • Check capacity requirements by incrementally increasing the load

    • Dark/Shadow Launches

      • Functionality is deployed in a production env without being visible or activated

      • Production traffic is duplicated and routed to the shadow version as well

      • Observing the shadow version without imparting the user

Live testing in Fog environment?

What are alternative solutions for the edge part?

  • Cloud part: ok

  • Difficult on edge devices which

    • May not have the capacity to run two versions in parallel

    • May have safety requirements which make canary releases impossible

  • Find a separate solution for the edge part

    • Mock edge devices in the cloud

    • Have a physical testbed

Deploying cloud applications vs. Deploying fog applications

  • Deploying cloud applications

    • Changes are pushed to devices via IaC

    • New virtual devices are created, configured and deployed with new version

      • Old instances are disconnected/terminated

  • Deploying fog applications

    • Edge devices often need to be physically connected at least once for deploying the first version

    • Use an app store-like approach

      • Update is sent to central software repository

      • Deployed application frequently checks for updates and self-updates if necessary => pull approach

    • Plan with incompatibilities and different version on devices

    • Used versioned interfaces

What is benchmarking?

What is a benchmarking tool?

  • Benchmarking is a way to systematically study the quality of cloud services based on experiments

  • Benchmarking tool creates an artificial load on the SUT, while carefully tracking detailed quality metrics

What are the benchmarking design objectives?

  • Relevance

    • Benchmark the important parts

    • Mimic real-world use

  • Repeatability

    • Maximize determinism in the benchmark

  • Fairness

    • Treat all SUTs the same

  • Portability

    • Avoid assumptions about the SUT

    • Make the benchmark broadly applicable

  • Understandability

    • Have an intuitive benchmark specification

What are fog-specific benchmarking challenges?

  • Geo-distribution of experiments

  • Deployment of benchmarking clients for edge-based SUTs

  • Distributed measurements of QoS

    • E2E latency in an IoT data processing pipeline

  • Multi-workload scenarios

    • Event-driven at the edge

    • OLAP and OLTP in the cloud

  • Complex analysis and results

What are the benchmarking implementation objectives?

  • Correctness

    • Assert adherence of implementation to specification

  • Distribution

    • Build the benchmarking tool for distributed deployments

    • Keep coordination pre-benchmark run

    • Consider clock synchronization

  • Fine-grained logging

    • Never discard information if not absolutely necessary

  • Reproducibility

    • Use repeatable benchmarks

    • Repeat often

    • Run sufficiently long

    • Document setting

  • Portability

    • Use adapter design

    • Consider extensibility and evolvement

    • Avoid assumptions on the SUT

  • Ease of use

    • Document everything

    • Provide instructions

    • Release code

Platforms & Applications / Basic Design Principals

State-of-the-Art: Cloud systems?

  • Microservice-based design

  • Infrastructure automation

  • Fault-tolerance through replication

  • Cluster-based deployment only in a few datacenters

    • Fog: single-node to cluster sized deployments on millions of sites

Platforms & Applications / Basic Design Principals

Geo-awareness in the cloud vs. Geo-awareness in fog

  • Geo-awareness in the cloud

    • Limited to large regions

    • High latency if the closest data center is quite far

    • Introduction Fog nodes

      • Fast connection to nearby fog nodes but limited bandwidth to cloud

      • Access points of mobile devices must be adapted based on their location

  • Geo-awareness

    • Infrastructure needs to expose location and network topology explicit

Platforms & Applications / Basic Design Principals

Fault tolerance for cloud applications?

Fault tolerance in fog applications?

  • Fault tolerance in cloud applications

    • Redundant servers

    • Retry-on-error principle (with other service instances)

    • Monitor services and their workload, auto-scaling

    • Chaos-Monkey randomly shuts down services to check if the system adapts and catches outage

  • Fault tolerance in fog applications

    • The prevalence of faults depends on the number of nodes

      • Systems and/or their components fail continuously

      • Connection infrastructure fails or operates with reduces quality

        • Power outage

        • Some devices transmit data under certain conditions (sunlight)

        • Eventual consistency problems may result in stale datasets

      • Buffer messages until its receiver is available again

      • Expect data staleness and ordering issues

      • Cache data aggressively

      • Compress data items as much as possible on unreliable connections

      • Plan with incompatibility, constantly monitor software versions on devices

      • Design for loose coupling

Platforms & Applications / Basic Design Principals

Geo-awareness in fog applications: What requirements?

  • Must be aware of its deployment location

  • Needs to handle client movement (handover to other edge devices)

  • Must be prepared to move components elsewhere (stateless application logic)

  • Must move data when necessary

  • May not rely on the availability of remote components

Case Studies / DeFog

  • Motivation

    • Application can be deployed in different ways

    • Various hardware options exist on the edge

    • How can we compare them?

  • Deployment options

    • Three deployment modes

      • Cloud only

      • Edge only

      • Cloud-Edge (Fog)

    • Docker as deployment vehicle

  • Approach

    • Use a set of representative benchmark applications

    • Measure E2E performance as well low-level metrics

    • 6 applications

      • Latency critical

      • Bandwidth intensive

      • Location aware

      • Compute intensive

Case Studies / BeFaaS

  • Benchmarking fog-based FaaS platforms

  • Federated deployments

    • Different cloud provider

  • Workloads

    • E-commerce application

    • IoT application

Case Studies / MockFog

  • How evaluate a fog application?

    • Without testing infrastructure

      • Guesses, small local testbeds, and simulation

    • Operate additional edge machines

      • Expensive, must be at same sites as production machines

    • Idea: Us an emulated fog infrastructure testbed that is set up in the cloud

      • Size/Power of Vms: Cloud instance types and Docker resource limits

      • Network characteristics: tc, iptables, etc.

    • MockFog

  • Three modules

    • Infrastructure emulation

    • Application management

    • Experiment orchestration

      • Compromises a finite set of states

      • Failure testing

  • Node Manager and Node Agents