FC
FC
FC
Set of flashcards Details
Flashcards | 72 |
---|---|
Language | English |
Category | Computer Science |
Level | University |
Created / Updated | 28.11.2020 / 17.07.2021 |
Weblink |
https://card2brain.ch/box/20201128_dbt
|
Embed |
<iframe src="https://card2brain.ch/box/20201128_dbt/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
What are Quorums?
Takes the middle ground of synchronous or asynchronous updates
Updates are propagated asynchronous, but they do not commit until a majority of replicas has acknowledged the updates
Reads cannot longer contact a single replica to avoid stale reads - quorum sizes must be set in a way to preclude concurrent updates and to assert intersection of reading and write quorums
Quorums: For N replicas and read/write quorum sizes R/W:
No stale reads?
No concurrent updates?
No stale reads: R+W > N
No concurrent updates: W > N/2
If these conditions are violated: sloppy quorum
What are the four replica placement strategies?
Global mapping
Hashing
Chaining
Scattering
Replica placement strategies / Global mapping:
Pro/Contra
Examples
Storage systems control replica placement in a single centralized component
Pro
Supports arbitrary complex and intelligent replica placement decisions
Contra
Comes with natural scalability and availability challenges
Single point of failure
All control flow needs to pass through a centralized component
Examples
GFS
Single master, entire placement and selection in the cluster
Shadow master servers to improve availability
Nebula
Grid-inspired distributed edge store
Centrally controlled placement in the DataStore master
Replica placement strategies / Hashing:
Pro/Contra
Examples
Hash-value (usually of the data item’s key) is used to deterministically identify a set of machines which will then store the data item
Pro
Scales very well as replica placement and selection are decentralized
Contra
Does not cope well with high node churn rates
Not a good fit for fog deployments as the full determinism of the static hash function
Makes it hard to consider the underlying network topologies in replica placement
Does not allow to place data close to actual access location based on current demand
Example
Chord
Assign nodes and data-items an m-bit ID
IDs are attained as a circular modulo 2^m
Data item is stored on a node whose ID is greeted or equal to its own ID
Each node hold a pointer to its pre- and successor which are used to lookup data
PAST
Kademlia
Dynamo
Cassandra
Replica placement strategies / Chaining:
Pro/Contra
Examples
Additional replicas are created (deterministically) on adjacent machines of a primary replica select through some other replica placement strategy
Pro
Makes it possible to control where chaining replicas should reside
Contra
Trends to clutter replicas on close physical proximity
Is relatively static, so not equipped well for dynamic replica movement
Example
Dynamo
Primary replica is selected through consistent hashing
Additional replicas are placed on the next N-1 nodes on the ring as defined by the replication factor
If a temporary node failure occurs -> a slightly relaxed version of consistent hashing with chaining
First N healthy nodes become replicas starting from the key range identified for the primary replica until the first N nodes are available again
Hinted handoffs
Cassandra
A feature called snitches prevents storing replicas on machines in the same rack or in the same datacenter
Replica placement strategies / Scattering:
Pro/Contra
Examples
Creates a pseudorandomized but deterministic distribution of replicas across machines
Pro
Can be used for good placement in geo-distributed deployments
Contra
Poorly equipped to deal with end-user mobility and resulting access pattern across system nodes
Example
CRUSH
Computes a pseudorandom data placement distribution based on a hierarchical description of the target cluster
What are hybrid approaches? (replica placement strategies)
The four strategies are not a natural fit for the fog
Combination of different strategies is the way to go
Examples
PNUTS
Global mapping for replica placement within a region and full replication across regions
DynamoDB
Offers global mapping for cross-region replication
FARSITE
Combines global mapping with scattering
IPFS
Bit-Torrent inspired protocol for data exchange and a hashing algorithm to determine storage locations
Case Studies / IPFS + RozoFS
IPFS is a peer to peer distributed file system
Object are content-addressable
Merkle DAG
Advantages
Content-based Addressing
Tamper-proof
Content is verified with a checksum
No duplication
Object with the same content have the same ID
Technologies
DHT
Block Exchange - Bit Torrent
Version Control SyStems - Git
Self-Certifying Filesystem
Not fog-ready: because of the slow DHT
Use a scale-out NAS (RozoFS) to enable site reads without using the DHT
Case Studies / Global Data Plane
Data-centric abstraction focused on the distribution, preservation, and protection of information
Builds upon append-only, single-writer logs
Lightweight and durable
Multiple simultaneous reads
No fixed location, migrated as necessary
Compositions are achieved by subscriptions
Location-independent routing
Large 256-bit address space
Packages are routed through an overlay network that uses a DHT
Enables flexible placement, controllable replication and simple migration of logs
GDP places logs within the infrastructure and advertises the location to the underlying routing layer
Placement and replication of logs can be optimized for latency, QoS. privacy and durability
Logs themselves are split into chunks whose placement can be optimized for durability and performance
Case Studies / FBase (FReD)
Application controlled replica placement
Key abstractions
Nodes
Group of one or more machines within one geographical site
Including a hosed or embedded storage system
Nodes only interact with other nodes as a whole (not with individual machines)
Coordination within nodes is done through the storage system
Keygroups
Group of data items that are replicated together
Own ACL
Applications declarative specify the set of key group members which controls data distribution
Keygroup members
One or both roles:
Replica nodes
Store a data replica, serve client requests, and manage keygroup configuration
Trigger nodes
Receive all updates as a stream of events and may trigger external systems via an event-based interface
Applications can specify a TTL for data retention on replica nodes
Why is FaaS promising for the edge?
Higher utilization of sacred edge resources
Stateless functions can be moved as needed
Event-driven is a good fit for many edge/fog applications
Flexibility
Platforms for the Edge / LeanOpenWhisk
OpenWhisk to heavy for Fog
Replaced the heaviest components
If sully compatible with OpenWhisk and part of the OpenWhisk releases
Cons: Still a lot of unnecessary code that was written for cloud-based deployments
Platforms for the Edge / tinyFaaS
Is it sufficient for the edge?
Goals: lightweight, extensible, http or http compatible
Key mechanisms
Remove as many components as possible
CoAP as application protocol
Parallel execution of request within a container
On container per client or per function
Experiments
Compare tinyFaaS to: native node.js, Lean OpenWhisk, Kubeless
Infrastructure: Raspberry Pi 3 B+
Measure latency at different load levels with hard SLA
Results
Native node.js: very low overhead of tinyFaaS
Lean OW: does not work on a RaspberryPi
Kubeless: comparable at very low load, but does not scale
Suffice for the edge?
Pro
Designed for small nodes
Much more efficient than alternative solutions
Contra
No support for cluster-based or on-device deployment
Platforms for the Edge / NanoLambda
Is it sufficient for the edge?
Targeted at extremely resource-constrained devices
Subset of standard libraries
Implements AWS Lambda API
Builds on CSPOT
Lightweight python VM for bytecodes (IoTPy)
Remove compilation (cloud/edge): code is never delivered to device
Life cycle of function
Deploy to NanoLambda Cloud/Edge service
Stores code
Compiles and caches compact bytecode representation on-demand
IoT device requests bytecode from NanoLambda service
Suffice for the edge?
Pro
Designed for small edge nodes and on-device
Cloud compatibility
Contra
No support for cloud-based clusters
Testing, Benchmarking and Monitoring:
Which stage?
Which level?
Combinations?
Testing
Testing stage
Method/function level
Focus on functional behavior
+ Monitoring: Live testing
+ Benchmarking: Performance test / Microbenchmarks
Benchmarking
Testing stage
System level
Stress test
Focus on QoS
+ Monitoring: Service / API benchmarking
Monitoring
Production stage
System level
Passive observation
Focus on QoS
What are the phases of testing?
Unit testing
Integration & Live Testing
Canary Testing
Dark Launches
A/B Testing
Cloud integration tests vs. Fog Integration tests
Cloud integration tests
Moch services, data, devices
Evaluate corner-cases which usually should not exist in production
Fog Integration tests
Much more difficult because of physical infrastructure
(partial) solution: virtualize & emulate fog environment in the cloud
What is live testing?
Examples?
Test new software version in production
Monitor what happens
While rolling out an update gradually
While directing part of the traffic to old and/or new version
Example
Blue/Green Deployments
Deploy new version to blue environment
Smoke tests against the blue system
Switch traffic from green to blue
Switch back to green on errors
Canary releasing
Rollout of a new version only to a subset of production servers
Easy to revert
Use it for A/B testing
Check capacity requirements by incrementally increasing the load
Dark/Shadow Launches
Functionality is deployed in a production env without being visible or activated
Production traffic is duplicated and routed to the shadow version as well
Observing the shadow version without imparting the user
Live testing in Fog environment?
What are alternative solutions for the edge part?
Cloud part: ok
Difficult on edge devices which
May not have the capacity to run two versions in parallel
May have safety requirements which make canary releases impossible
Find a separate solution for the edge part
Mock edge devices in the cloud
Have a physical testbed
Deploying cloud applications vs. Deploying fog applications
Deploying cloud applications
Changes are pushed to devices via IaC
New virtual devices are created, configured and deployed with new version
Old instances are disconnected/terminated
Deploying fog applications
Edge devices often need to be physically connected at least once for deploying the first version
Use an app store-like approach
Update is sent to central software repository
Deployed application frequently checks for updates and self-updates if necessary => pull approach
Plan with incompatibilities and different version on devices
Used versioned interfaces
What is benchmarking?
What is a benchmarking tool?
Benchmarking is a way to systematically study the quality of cloud services based on experiments
Benchmarking tool creates an artificial load on the SUT, while carefully tracking detailed quality metrics
What are the benchmarking design objectives?
Relevance
Benchmark the important parts
Mimic real-world use
Repeatability
Maximize determinism in the benchmark
Fairness
Treat all SUTs the same
Portability
Avoid assumptions about the SUT
Make the benchmark broadly applicable
Understandability
Have an intuitive benchmark specification
What are fog-specific benchmarking challenges?
Geo-distribution of experiments
Deployment of benchmarking clients for edge-based SUTs
Distributed measurements of QoS
E2E latency in an IoT data processing pipeline
Multi-workload scenarios
Event-driven at the edge
OLAP and OLTP in the cloud
Complex analysis and results
What are the benchmarking implementation objectives?
Correctness
Assert adherence of implementation to specification
Distribution
Build the benchmarking tool for distributed deployments
Keep coordination pre-benchmark run
Consider clock synchronization
Fine-grained logging
Never discard information if not absolutely necessary
Reproducibility
Use repeatable benchmarks
Repeat often
Run sufficiently long
Document setting
Portability
Use adapter design
Consider extensibility and evolvement
Avoid assumptions on the SUT
Ease of use
Document everything
Provide instructions
Release code
Platforms & Applications / Basic Design Principals
State-of-the-Art: Cloud systems?
Microservice-based design
Infrastructure automation
Fault-tolerance through replication
Cluster-based deployment only in a few datacenters
Fog: single-node to cluster sized deployments on millions of sites
Platforms & Applications / Basic Design Principals
Geo-awareness in the cloud vs. Geo-awareness in fog
Geo-awareness in the cloud
Limited to large regions
High latency if the closest data center is quite far
Introduction Fog nodes
Fast connection to nearby fog nodes but limited bandwidth to cloud
Access points of mobile devices must be adapted based on their location
Geo-awareness
Infrastructure needs to expose location and network topology explicit
Platforms & Applications / Basic Design Principals
Fault tolerance for cloud applications?
Fault tolerance in fog applications?
Fault tolerance in cloud applications
Redundant servers
Retry-on-error principle (with other service instances)
Monitor services and their workload, auto-scaling
Chaos-Monkey randomly shuts down services to check if the system adapts and catches outage
Fault tolerance in fog applications
The prevalence of faults depends on the number of nodes
Systems and/or their components fail continuously
Connection infrastructure fails or operates with reduces quality
Power outage
Some devices transmit data under certain conditions (sunlight)
Eventual consistency problems may result in stale datasets
Buffer messages until its receiver is available again
Expect data staleness and ordering issues
Cache data aggressively
Compress data items as much as possible on unreliable connections
Plan with incompatibility, constantly monitor software versions on devices
Design for loose coupling
Platforms & Applications / Basic Design Principals
Geo-awareness in fog applications: What requirements?
Must be aware of its deployment location
Needs to handle client movement (handover to other edge devices)
Must be prepared to move components elsewhere (stateless application logic)
Must move data when necessary
May not rely on the availability of remote components
Case Studies / DeFog
Motivation
Application can be deployed in different ways
Various hardware options exist on the edge
How can we compare them?
Deployment options
Three deployment modes
Cloud only
Edge only
Cloud-Edge (Fog)
Docker as deployment vehicle
Approach
Use a set of representative benchmark applications
Measure E2E performance as well low-level metrics
6 applications
Latency critical
Bandwidth intensive
Location aware
Compute intensive
Case Studies / BeFaaS
Benchmarking fog-based FaaS platforms
Federated deployments
Different cloud provider
Workloads
E-commerce application
IoT application
Case Studies / MockFog
How evaluate a fog application?
Without testing infrastructure
Guesses, small local testbeds, and simulation
Operate additional edge machines
Expensive, must be at same sites as production machines
Idea: Us an emulated fog infrastructure testbed that is set up in the cloud
Size/Power of Vms: Cloud instance types and Docker resource limits
Network characteristics: tc, iptables, etc.
MockFog
Three modules
Infrastructure emulation
Application management
Experiment orchestration
Compromises a finite set of states
Failure testing
Node Manager and Node Agents