FC
FC
FC
Kartei Details
Karten | 72 |
---|---|
Sprache | English |
Kategorie | Informatik |
Stufe | Universität |
Erstellt / Aktualisiert | 28.11.2020 / 17.07.2021 |
Weblink |
https://card2brain.ch/box/20201128_dbt
|
Einbinden |
<iframe src="https://card2brain.ch/box/20201128_dbt/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>
|
Cloud Computing Characteristics (NIST)
On-Demand Self-Service
Broad Network Access
Resource Pooling
Rapid Elasticity
Measured Service
Why is cloud computing not enough?
Requires continuous connectivity
Too high latency
Bandwidth limitations
Regulations / privacy requirements
What is the edge?
Outskirt of an administrative domain
What is Fog Computing?
What does it provide?
Extension of the cloud model
applications can reside on multiple layers of a networks' topology
Combining cloud resources with edge devices and potential intermediary nodes in the network
Provides the ability to analyze data near the edge for
improving efficiency or
to operate while disconnected from a larger network
Cloud service can be used for tasks that require mode resources or elasticity
Fog Computing Characteristics
Runs required computations near the end-user
Uses lower latency storage at or near the edge
Uses low latency communication
Implements elements of management
Uses Cloud for strategic tasks
Multi-tenancy on a massive scale is required for some use cases
Geo Distributed
Physical location is significant
A dynamic pool of sites => unreliable connections between sites
Sites may be resource-constrained
In which areas does Fog Computing benefit?
Data Collection, Analytics & Privacy
Security
Moving security closed to the edge => higher performance security applications
Compliance Requirements
Geofencing, data sovereignty, copyright enforcement
Real-Time
Challenges to Adoption of Fog Computing (inherent)
General
They result from the very idea of using fog resources
Technical constraints
limits of computational power
Logical constraints
tradeoffs in distributed systems
Market constraints
there are currently no managed edge services
No Edge Services
Lack of Standardized Hardware
Management Effort
Managing QoS
IoT or autonomous cars have stronger quality requirements
More problems (network latency/partitioning, message loss/reordering) in non-centralized systems
No Network Transparency
Challenges to Adoption of Fog Computing (external)
General
The result from external entities
Government agencies
Attackers
Physical Security
E.g attaching hardware on top of street light pole instead of eye level
Protection against fire and vandalism
Legal and Regulatory Requirements
Data needs to be held in a certain physical location (eHealth)
Liquid Fog-based applications might have trouble fulfilling certain aspects of privacy regulations
Synchronous Communication
Example: phone call, method call in Java
Requires both parties to be on-line
The caller must wait and both, server and client need to be alive
Disadvantages
Higher probability of failures
Difficult to identify and react to failures
The one-to-one system is not practical for complex interactions
Finding out when the failure took place is not easy
Asynchronous Communication
Clients can do other things when they are waiting
Examples: Email, JavaScript callbacks
Types of decoupling
Space (Location)
Time
Technology
Data Format
Messaging patterns
Request / Response (1 to 1)
Load Balancing (1 to many)
Fan-out / Fan-in (1 to many / many to 1)
Broadcasting (many to many)
Pub/Sub (many to many, but structured)
What is Pub/Sub Messaging?
Clients can act as Publisher or Subscriber or both
Communication is many-to-many
Pub/Sub: Matching of Events and Subscriptions
Channel-based (low level of expressiveness)
Topic-based
Content-based (high level of expressiveness)
Pub/Sub: Broker vs. P2P
The broker handles client communication centrally
P2P clients have to route messages themselves
Broker-based setups are a good fit for fog
MQTT Pub/Sub Protocol
Lightweight and designed for devices that run in constrained environments
Topic-based
Broker-based
Inter-Broker Routing Strategies
+ basic description
Event Flooding or Subscription Flooding
Events/Subscriptions broadcasted to all brokers
Minimizes end-to-end latency
A lot of excess data
Gossiping
Messages are distributed based on probability distribution
High tolerance for very dynamic environments
Messages might not arrive at all or with high delay
Selective - Filtering
Good if not all brokers are interconnected
Subscription information is exchanged with neighbors
Events are only forwarded to brokers that lie on a path to a subscription
Selective - Rendezvous Points (RP)
RPs are the meeting points for events and subscription
Must be close to clients -> otherwise high-end latency
Case Studies / Broadcast Groups
Combining flooding and rendezvous points
Global flooding
Broadcast messages to all brokers
Communication latency is optimal, but a lot of excess data
Rendezvous point in the cloud
Fog broker forward events to a central cloud broker
=> cloud decides which other fog brokers need events
Minimizes excess data, but increases latency
Tradeoff between latency and excess data dissemination
Case Studies / Broadcast Groups / Broadcast group formation
Initially, each broker takes the role of a leader
Leaders subscribe to a dedicated topic at the cloud RP to detect other leaders
Leaders measure latency to other leaders
If below a given latency threshold => merge
Merge: determine new group leader (e.g. based on compute resources)
Migrate members to new leader
If latency to a leader is above given latency threshold, leave group
Latency threshold controls group size
Can be used to manage the latency vs. excess data tradeoff
Case Studies / Vehicular Fog Computing
Vehicles
Collect data
Use it for vehicle-level decisions
Transmit data to closest fog nodes
Asynchronous Request/Reply or Fan-Out
Fog nodes
Process data of multiple cars of area-level decisions
Send instructions to traffic lights
Synchronous Fan-In / Fan-Out
Send aggregated status reports to cloud
Synchronous Fan-In
Traffic lights
Operate as defined by instructions
Cloud
Processes data from fog nodes of city-level decision
There might be an internal load balancer
Publish traffic information to subscribed vehicles
Pub/Sub
Case Studies / DisGB
IoT data distribution is often non-uniform
It depends on where events are relevant / where relevant events can come from
Can be expressed with geo-context
Idea: use geo-contexts to identify RPs (two strategies)
The event geofence can be used to identify RPs that are close to the subscribers of an event
The RPs for an event are all brokers that are the respectively closest broker to each of the subscribers that have created a matching subscription
Subscriptions are not distributed
Similar to flooding events, events are distributed
The subscription geofence can be used to identify RPs that are close to the publisher events
The RP for an event is the broker closest to the publisher of that event
Events are not distributed
Similar to flooding subscriptions, subscriptions are distributed
What is replication?
Is a common strategy in data management and in distributed systems
Main idea
maintain multiple companies of an entity (called replicas)
on multiple servers
for better availability
and performance
Keeping replicas consistent is costly
Why do we need replication?
System availability / Fault-tolerance
Failure resilience is critical in any enterprise system
Keeping several copies of the server -> single failures should not affect the overall availability
Redundancy allows switch over in case of failures
Replicas can protect against corrupted data (voting)
Performance / Scalability
Large workloads can be spread and balanced across distributed replicas
Local access is fast, remote access is slow
Keep copies in clients’ proximity
What are three different replication scenarios?
What is replica consistency?
Read to any replica returns the result of the latest write to the logical data store
Consistency is expensive, hence different consistency models exist
What does CAP and PACELC mean?
CAP
Consistency
Availability
Partition tolerance
PACELC
CAP Else Latency and Consistency (when the system is running normally / absence of partitions)
Characterizing consistency?
Staleness
How much is a given replica lagging behind?
Ordering
How much does the operation serialization order deviate among replicas?
Data-centric consistency models
Sequential Consistency
All replicas execute all updates in the same order
Causal consistency
All replicas execute causally-realted operations in the same order, concurrent request are executed in arbitrary order
Eventual Consistency
In the absence of updates and failures, all replicas converge towards the same state
Client-centric consistency models
Monotonic Reads
A read will never return older values than previously returned to the same client
Read Your Writes
A read will never return older values than previously written by the same client
Write Follows Reads
A client read version X and then updates the same data time, will only update replicas that have at least version X
Monotonic Writes
Two updates of the same client will always be serializes corresponding to the chronicle order of their submission
What are the two parameters when designing a replication strategy?
When updates are propagated
Where updates are propagated
Replication strategies / When: What are the two options?
And how do they work?
Synchronous (eager)
Propagates changes to the data immediately to all existing copies (before the commit)
The ACID properties can apply to all replica updates
Data copies are consistent at all times and at all sites
On update: consult with everybody else and only if an agreement among sites is reached the data is updated
However, the system is unavailable for updates if only a single replica cannot be reached
Asynchronous (lazy)
First executed and committed on the local copy, then propagates changes
During propagations, copies are inconsistent
The update is eventually propagated to all sites (push/pull) and assuming no conflicts arise, the data eventually becomes consistent
Replication strategies / Where: What are the two options?
And how do they work?
Primary copy (master)
Only one copy where the update can originate, all other copies (secondary) are updated reflecting the changes to the master
Secondary copies are read-only
Updates everywhere (group)
Changes can be initiated at any of the copies
Advantages and Disadvantages of synchronous replication?
Advantages
No inconsistencies (identical copies)
Regarding the local copy yields the most up-to-date value
Changes are atomic
Disadvantages
An operation has to update all sites
Linger execution time
Worse response time
Poor availability
Advantages and Disadvantages of asynchronous replication?
Advantages
An operation is always local
Good response time
High availability
Disadvantages
Data inconsistencies
A local read does not always return the most up-to-date value
Changes to all copies are not guaranteed
Replication is not transparent
Advantages and Disadvantages of update everywhere replication?
Advantages
Any site can run an operation
Load is evenly distributed
Disadvantages
Copies must be synchronized
Concurrent updates will cause conflicts
Advantages and Disadvantages of primary copy replication?
Advantages
No inter-site synchronization
There is always one site that has all the updates
Disadvantages
The load at the primary copy can be quite large
Reading the local copy may not yield the most up-to-date value
Synchronous + Primary copy
Advantages/Disadvantages?
Practical?
Advantages
Updates do not need to be coordinated
No inconsistencies
Disadvantages
Longest response time
Only useful with few updates
Local copies are read-only
Low availability
Ideal: Globally correct, Remote writes
Practical: Too expensive (usefulness)
Asynchronous + Primary copy
Advantages/Disadvantages?
Practical?
Advantages
No coordination necessary
Short response times
Disadvantages
Local copes are not up-to-date
Inconsistencies
Low write availability
Ideal: Inconsistency reads
Practical: Feasible (limited scalability)
Synchronous + Update everywhere
Advantages/Disadvantages?
Practical?
Advantages
No inconsistencies
Elegant symmetric solution
Disadvantages
Long response times
Updates need to be coordinated
Low availability
Ideal: Globally correct, Local writes
Practical: Too expensive (does not scale)
Asynchronous + Update everywhere
Advantages/Disadvantages?
Practical?
Advantages
No centralized coordination
Shortest response times
High availability
Disadvantages
Inconsistencies and conflicts
Updates can be lost (reconciliation)
Ideal: Inconsistency reads, Reconciliation
Practical: Feasible in many applications