FC

FC

FC


Set of flashcards Details

Flashcards 72
Language English
Category Computer Science
Level University
Created / Updated 28.11.2020 / 17.07.2021
Weblink
https://card2brain.ch/box/20201128_dbt
Embed
<iframe src="https://card2brain.ch/box/20201128_dbt/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

Cloud Computing Characteristics (NIST)

  • On-Demand Self-Service

  • Broad Network Access

  • Resource Pooling

  • Rapid Elasticity

  • Measured Service

Why is cloud computing not enough?

  • Requires continuous connectivity

  • Too high latency

  • Bandwidth limitations

  • Regulations / privacy requirements

What is the edge?

Outskirt of an administrative domain

What is Fog Computing?

What does it provide?

  • Extension of the cloud model

    • applications can reside on multiple layers of a networks' topology

  • Combining cloud resources with edge devices and potential intermediary nodes in the network

 

  • Provides the ability to analyze data near the edge for

    • improving efficiency or

    • to operate while disconnected from a larger network

  • Cloud service can be used for tasks that require mode resources or elasticity

Fog Computing Characteristics

  • Runs required computations near the end-user

  • Uses lower latency storage at or near the edge

  • Uses low latency communication

  • Implements elements of management

  • Uses Cloud for strategic tasks

  • Multi-tenancy on a massive scale is required for some use cases

  • Geo Distributed

    • Physical location is significant

    • A dynamic pool of sites => unreliable connections between sites

    • Sites may be resource-constrained

In which areas does Fog Computing benefit?

  • Data Collection, Analytics & Privacy

  • Security

    • Moving security closed to the edge => higher performance security applications

  • Compliance Requirements

    • Geofencing, data sovereignty, copyright enforcement

  • Real-Time

Challenges to Adoption of Fog Computing (inherent)

  • General

    • They result from the very idea of using fog resources

    • Technical constraints

      • limits of computational power

    • Logical constraints

      • tradeoffs in distributed systems

    • Market constraints

      • there are currently no managed edge services

  • No Edge Services

  • Lack of Standardized Hardware

  • Management Effort

  • Managing QoS

    • IoT or autonomous cars have stronger quality requirements

    • More problems (network latency/partitioning, message loss/reordering) in non-centralized systems

  • No Network Transparency

Challenges to Adoption of Fog Computing (external)

  • General

    • The result from external entities

    • Government agencies

    • Attackers

  • Physical Security

    • E.g attaching hardware on top of street light pole instead of eye level

    • Protection against fire and vandalism

  • Legal and Regulatory Requirements

    • Data needs to be held in a certain physical location (eHealth)

    • Liquid Fog-based applications might have trouble fulfilling certain aspects of privacy regulations

Synchronous Communication

  • Example: phone call, method call in Java

  • Requires both parties to be on-line

  • The caller must wait and both, server and client need to be alive

  • Disadvantages

    • Higher probability of failures

    • Difficult to identify and react to failures

    • The one-to-one system is not practical for complex interactions

  • Finding out when the failure took place is not easy

Asynchronous Communication

  • Clients can do other things when they are waiting

  • Examples: Email, JavaScript callbacks

Types of decoupling

  • Space (Location)

  • Time

  • Technology

  • Data Format

Messaging patterns

  • Request / Response (1 to 1)

  • Load Balancing (1 to many)

  • Fan-out / Fan-in (1 to many / many to 1)

  • Broadcasting (many to many)

  • Pub/Sub (many to many, but structured)

What is Pub/Sub Messaging?

  • Clients can act as Publisher or Subscriber or both

  • Communication is many-to-many

Pub/Sub: Matching of Events and Subscriptions

  • Channel-based (low level of expressiveness)

  • Topic-based

  • Content-based (high level of expressiveness)

Pub/Sub: Broker vs. P2P

  • The broker handles client communication centrally

  • P2P clients have to route messages themselves

  • Broker-based setups are a good fit for fog

MQTT Pub/Sub Protocol

  • Lightweight and designed for devices that run in constrained environments

  • Topic-based

  • Broker-based

Inter-Broker Routing Strategies

+ basic description

  • Event Flooding or Subscription Flooding

    • Events/Subscriptions broadcasted to all brokers

    • Minimizes end-to-end latency

    • A lot of excess data

  • Gossiping

    • Messages are distributed based on probability distribution

    • High tolerance for very dynamic environments

    • Messages might not arrive at all or with high delay

  • Selective - Filtering

    • Good if not all brokers are interconnected

    • Subscription information is exchanged with neighbors

    • Events are only forwarded to brokers that lie on a path to a subscription

  • Selective - Rendezvous Points (RP)

    • RPs are the meeting points for events and subscription

    • Must be close to clients -> otherwise high-end latency

Case Studies / Broadcast Groups

  • Combining flooding and rendezvous points

    • Global flooding

      • Broadcast messages to all brokers

      • Communication latency is optimal, but a lot of excess data

    • Rendezvous point in the cloud

      • Fog broker forward events to a central cloud broker

        • => cloud decides which other fog brokers need events

      • Minimizes excess data, but increases latency

    • Tradeoff between latency and excess data dissemination

Case Studies / Broadcast Groups / Broadcast group formation

  • Initially, each broker takes the role of a leader

  • Leaders subscribe to a dedicated topic at the cloud RP to detect other leaders

  • Leaders measure latency to other leaders

    • If below a given latency threshold => merge

    • Merge: determine new group leader (e.g. based on compute resources)

    • Migrate members to new leader

  • If latency to a leader is above given latency threshold, leave group

  • Latency threshold controls group size

  • Can be used to manage the latency vs. excess data tradeoff

Case Studies / Vehicular Fog Computing

  • Vehicles

    • Collect data

    • Use it for vehicle-level decisions

    • Transmit data to closest fog nodes

      • Asynchronous Request/Reply or Fan-Out

  • Fog nodes

    • Process data of multiple cars of area-level decisions

    • Send instructions to traffic lights

      • Synchronous Fan-In / Fan-Out

    • Send aggregated status reports to cloud

      • Synchronous Fan-In

  • Traffic lights

    • Operate as defined by instructions

  • Cloud

    • Processes data from fog nodes of city-level decision

    • There might be an internal load balancer

    • Publish traffic information to subscribed vehicles

      • Pub/Sub

Case Studies / DisGB

  • IoT data distribution is often non-uniform

  • It depends on where events are relevant / where relevant events can come from

  • Can be expressed with geo-context

  • Idea: use geo-contexts to identify RPs (two strategies)

    • The event geofence can be used to identify RPs that are close to the subscribers of an event

      • The RPs for an event are all brokers that are the respectively closest broker to each of the subscribers that have created a matching subscription

        • Subscriptions are not distributed

        • Similar to flooding events, events are distributed

    • The subscription geofence can be used to identify RPs that are close to the publisher events

      • The RP for an event is the broker closest to the publisher of that event

        • Events are not distributed

        • Similar to flooding subscriptions, subscriptions are distributed

What is replication?

  • Is a common strategy in data management and in distributed systems

  • Main idea

    • maintain multiple companies of an entity (called replicas)

    • on multiple servers

    • for better availability

    • and performance

  • Keeping replicas consistent is costly

Why do we need replication?

  • System availability / Fault-tolerance

    • Failure resilience is critical in any enterprise system

    • Keeping several copies of the server -> single failures should not affect the overall availability

    • Redundancy allows switch over in case of failures

  • Replicas can protect against corrupted data (voting)

  • Performance / Scalability

    • Large workloads can be spread and balanced across distributed replicas

    • Local access is fast, remote access is slow

      • Keep copies in clients’ proximity

What are three different replication scenarios?

  • Replicating server on a common resource may help availability if there is a replicated cache coherence mechanism

  • To get improvement in availability the resources must be replicated too

  • Replicated servers and resource replicas are not necessarily tightly coupled

What is replica consistency?

  • Read to any replica returns the result of the latest write to the logical data store

  • Consistency is expensive, hence different consistency models exist

What does CAP and PACELC mean?

  • CAP

    • Consistency

    • Availability

    • Partition tolerance

  • PACELC

    • CAP Else Latency and Consistency (when the system is running normally / absence of partitions)

Characterizing consistency?

  • Staleness

    • How much is a given replica lagging behind?

  • Ordering

    • How much does the operation serialization order deviate among replicas?

Data-centric consistency models

  • Sequential Consistency

    • All replicas execute all updates in the same order

  • Causal consistency

    • All replicas execute causally-realted operations in the same order, concurrent request are executed in arbitrary order

  • Eventual Consistency

    • In the absence of updates and failures, all replicas converge towards the same state

Client-centric consistency models

  • Monotonic Reads

    • A read will never return older values than previously returned to the same client

  • Read Your Writes

    • A read will never return older values than previously written by the same client

  • Write Follows Reads

    • A client read version X and then updates the same data time, will only update replicas that have at least version X

  • Monotonic Writes

    • Two updates of the same client will always be serializes corresponding to the chronicle order of their submission

What are the two parameters when designing a replication strategy?

  • When updates are propagated

  • Where updates are propagated

Replication strategies / When: What are the two options?

And how do they work?

  • Synchronous (eager)

    • Propagates changes to the data immediately to all existing copies (before the commit)

    • The ACID properties can apply to all replica updates

    • Data copies are consistent at all times and at all sites

    • On update: consult with everybody else and only if an agreement among sites is reached the data is updated

      • However, the system is unavailable for updates if only a single replica cannot be reached

  • Asynchronous (lazy)

    • First executed and committed on the local copy, then propagates changes

      • During propagations, copies are inconsistent

    • The update is eventually propagated to all sites (push/pull) and assuming no conflicts arise, the data eventually becomes consistent

Replication strategies / Where: What are the two options?

And how do they work?

  • Primary copy (master)

    • Only one copy where the update can originate, all other copies (secondary) are updated reflecting the changes to the master

    • Secondary copies are read-only

  • Updates everywhere (group)

    • Changes can be initiated at any of the copies

Advantages and Disadvantages of synchronous replication?

  • Advantages

    • No inconsistencies (identical copies)

    • Regarding the local copy yields the most up-to-date value

    • Changes are atomic

  • Disadvantages

    • An operation has to update all sites

      • Linger execution time

      • Worse response time

      • Poor availability

Advantages and Disadvantages of asynchronous replication?

  • Advantages

    • An operation is always local

      • Good response time

      • High availability

  • Disadvantages

    • Data inconsistencies

    • A local read does not always return the most up-to-date value

    • Changes to all copies are not guaranteed

    • Replication is not transparent

Advantages and Disadvantages of update everywhere replication?

  • Advantages

    • Any site can run an operation

    • Load is evenly distributed

  • Disadvantages

    • Copies must be synchronized

    • Concurrent updates will cause conflicts

Advantages and Disadvantages of primary copy replication?

  • Advantages

    • No inter-site synchronization

    • There is always one site that has all the updates

  • Disadvantages

    • The load at the primary copy can be quite large

    • Reading the local copy may not yield the most up-to-date value

Synchronous + Primary copy

Advantages/Disadvantages?

Practical?

  • Advantages

    • Updates do not need to be coordinated

    • No inconsistencies

  • Disadvantages

    • Longest response time

    • Only useful with few updates

    • Local copies are read-only

    • Low availability

  • Ideal: Globally correct, Remote writes

  • Practical: Too expensive (usefulness)

Asynchronous + Primary copy

Advantages/Disadvantages?

Practical?

  • Advantages

    • No coordination necessary

    • Short response times

  • Disadvantages

    • Local copes are not up-to-date

    • Inconsistencies

    • Low write availability

  • Ideal: Inconsistency reads

  • Practical: Feasible (limited scalability)

Synchronous + Update everywhere

Advantages/Disadvantages?

Practical?

  • Advantages

    • No inconsistencies

    • Elegant symmetric solution

  • Disadvantages

    • Long response times

    • Updates need to be coordinated

    • Low availability

  • Ideal: Globally correct, Local writes

  • Practical: Too expensive (does not scale)

Asynchronous + Update everywhere

Advantages/Disadvantages?

Practical?

  • Advantages

    • No centralized coordination

    • Shortest response times

    • High availability

  • Disadvantages

    • Inconsistencies and conflicts

    • Updates can be lost (reconciliation)

  • Ideal: Inconsistency reads, Reconciliation

  • Practical: Feasible in many applications