Sunteți pe pagina 1din 34

1.

Sisteme distribuite (SD): caracteristici fundamentale


2. Comunicarea în rețea:
a)Ierarhii de servicii în sistemele de comunicații.

Decrierea nivelurilor:
-Application: prin intermediul acestui nivel putem sa accesam internetul cu ajutorul interfetelor si
serviciilor, mail , baze de date partajate.
-Presentation: se focuseaza asupra sintaxei si semnaticei a informatiei care este transmisa.
Furnizeaza task-uri ca , translarea, encripteaza si decripteaza information in format text(anume byte
stream).
-Session: layer care se ocupa de deschiderea conexiunii dintre diferite masini pentru a asigura
sincronizarea si mentenanta. Serviciile in acest scop sunt: dialog control , syncServices, token
management.
-Transport: acest layer primeste informatia in forma de packet de layer-ul de mai sus si o transmite
in aceeasi forma mai departe. Alte functionalitati:
1. service point addresing – alegerea corecta a portului, includerea mesajului in procesul necesar.
2.connection control: connection-oriented(dupa livrare conecxiunea se face la nivelul transport si
cu host masina) si conecctionless(fiecare pachet este unul independent).
3.segmentation:dividerea in segmente prin asignarea unui numer de secventa, pentru a fi posibila
reasamblarea.
4.flow control: end-to-end monitorizare a pachetelor.
5.error control: monitorizarea mesajului pentru asigurarea ca a venit si este transmis fara erori, ele
pot fi corectate prin re-transmisie.
-Networok:adresarea logica si rutarea. Transleaza adresei retelei logice in MAC adresa(fizica).
Prevenirea congestion control.
-Data: transformarea informatiei in link reliable. Asigura ca informatia care va ajunge la Physical
Layer nu va contine erori, informatia este divizate in frames.
-Physical: opereaza cu bits , care ii transmite prin canalul de comunicare. Contine cateva configurari:
data rate, physical topology, line configuration.

b) Protocoale de transport (TCP și UDP)


c) Sockets - interfață către serviciile de rețea pentru programarea aplicațiilor
Socketurile se utilizeaza pentru a face conexiunea dintre client si server prin intermediul
serviciilor de retea, server-ul contine un listen pe un anumit port , care astepta conexiunea
din partea la client, clientul trebuie sa cunoasca ip, port si numele serverului pentru a se
conecta.
Socketurile sunt de 4 tipuri:
-stream sockets: asigura livrarea mesajului si ordinea. Utilizeaza TCP, daca nu a fost
posibila transmiterea emitatorul va primi un mesaj de eroare.
-datagram sockets: nu garanteaza livrarea, conexiunea nu este necesara de a fi deschisa,
strange pachetul si il transmite. UDP
-raw sockets: dezvoltarea protocoalelor noi.
-sequenced packet sockets: manipularea la header.
d) Broadcasting, multicasting, unicasting;

1. Unicast- acest tip de trasnmitere a informatiei este util in cazul in care este un singur
emitator si un singur receptor, comunicare one-to-one.
2. Broadcast – tehnica de a transmite informatia de la o singura sursa la mai multe
destinatii, one-to-many comunicare. El este de cateva tipuri: limited
broadcasting(network cluster), direct broadcasting(un nod din Network A spre
toate nodurile din network B).
i. Multicast- participa unu/multi emitatori si unu/multi receptori. Multicast-ul are
nevoie si de suportul altor protcoale(IGMP, multicast routing).

In comunicarea la nivelul protocoalelor enumerate mai este nevoie si de un alt


protocol important care face translarea din IP address in adresa fizica, ARP(Address
Resolution Protocol).
Astfel putem specifica si care device-uri: swtich-uri, routers.
2)Spații de descentralizare
a) Spațiul distribuit Enslow

At least four physical components of a system might be distributed: hardware or processing logic,

data, the processing itself, and the control.

Definition of distributed data processing systems. This definition has five components:

* A multiplicity of general-purpose resource components, including both physical and logical

resources, that can be assigned to specific tasks on a dynamic basis. Homogeneity of physical

resources is not essential.

* A physical distribution of these physical and logical components of the system interacting through
a communication network. (A network uses a two-party cooperative protocol to con-

trol the transfers of information.)

* A high-level operating system that unifies and integrates the control of the distributed
components. Individual processors each have their own local operating system, and these may be
unique.
* System transparency, permitting services to be requested by name only. The server does not have

to be identified.

* Cooperative autonomy, characterizing the operation and interaction of both' physical and logical

resources.

Database descentralization:

Control decentralization:
Hardware descentralization

Clasificarea Flynn:

Flynn's classification divides computers into four major groups


that are:

An SISD computing system is a uniprocessor machine which is capable of executing a


single instruction, operating on a single data stream. In SISD, machine instructions are
processed in a sequential manner and computers adopting this model are popularly called
sequential computers.

An SIMD system is a multiprocessor machine capable of executing the same instruction on


all the CPUs but operating on different data streams. Machines based on an SIMD model
are well suited to scientific computing since they involve lots of vector and matrix operations

An MISD computing system is a multiprocessor machine capable of executing different


instructions on different Pes(processing elements) but all of them operating on the same
dataset .
Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines built using the
MISD model are not useful in most of the application, a few machines are built, but none of
them are available commercially.

An MIMD system is a multiprocessor machine which is capable of executing multiple


instructions on multiple data sets. Each PE in the MIMD model has separate instruction and
data streams; therefore machines built using this model are capable to any kind of
application. Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.

Clasificarea MIMD
Multiple Instruction, Multiple Data (MIMD) refers to a parallel architecture, which is probably
the most basic, but most familiar type of parallel processor. Its key objective is to achieve
parallelism.
In this organization, all processors in a parallel computer can execute different
instructions and operate on various data at the same time.

In MIMD, each processor has a separate program and an instruction stream is generated
from each program.
Clasificarea Skillycorn
Calculatoare cu memorie comună și distribuită
Spațiul descentralizării controlului al lui Jensen;

Gradul de implicare:
a) Max centralizat: un singur controller realizeaza o instanta particulara a activitatii
asupra resursei
b) Max descentralizat:fiecare controller pt activitate este implicat in fiecare instanta a
acesteia
Grad de egalitate:
a) Max centralizat: un singur controller are intreaga responsabilitate ai autoritate pentru
activitatea considerata
b) Max descentralizat: fiecare controller este egal capabil sa participle in activitate
Nr controalelor:
a) Max centralizat
b) Max descentralizat

In cazul primului factor->


a) Centralizarea maximala:fiecare subset e gestionat de catre un controller independent de alte
controlere
b) Descentralizarea maxima: fiecare controller participa impreuna cu fiecare alt controller in
gestiunea multilaterala a cel putin unei resurse

Factorul 2 ->

a) Maxim centralizat: niciuna din resursele sistemului nu este multilateral gestionata


b) Maxim descentralizat : toate resursele din sistem sunt gestionate multilateral

4. Concurența: proprietate inerentă sistemelor distribuite. Modele de programare

a) Inter Process Communication through shared memory is a concept where two or


more process can access the common memory. And communication is done via this
shared memory where changes made by one process can be viewed by another
process.
b) Data parallelism is a form of parallelization which relies on splitting the
computation by subdividing data across multiple processors in parallel
computing environments. A data parallel algorithm focuses on distributing
the data across different parallel computing nodes, in contrast to task
parallelism which aims at subdividing the operations to perform

c) Message passing concurrency is concurrency among two or more processes


(here, a process is a flow of control) where there is no shared region between
the two processes. Instead they communicate by passing messages

Perspectiva partiționării datelor și/sau algoritmilor

Parallel processing is the division of a problem, presented as a data structure or a set of actions,
among multiple processing components that operate simultaneously. The expected result is a more
efficient completion of the solution to the problem. Its main advantage is the ability to handle tasks
of a scale that would be unrealistic or not cost-effective for other systems
• Functional parallelism - poate fi găsit în problemele în care un calcul poate fi descris în termeni de
o serie de operații ordonate în timp. Deoarece fiecare etapă reprezintă o modificare a valorii sau
efectului în timp, trebuie luată în considerare o cantitate mare de comunicare între componentele
soluției, sub forma unui flux de date sau operațiuni.
• Domain parallelism implică probleme în care un set de operații aproape independente trebuie să
fie efectuat pe date locale ordonate.
• Activity parallelism Multe componente împărtășesc accesul la o parte dintr-o structură de date.
Deoarece fiecare componentă realizează calcule independente, comunicarea între componentele
procesării nu este necesară. Cu toate acestea, cantitatea de comunicare nu este nulă. Comunicarea
este necesară între o componentă care controlează accesul componentelor la structura de date și
componentele de procesare
Considering only the processing characteristic of the components:
a) Homogeneous systems- are based on identical components interacting in accordance with
simple sets of behavioural rules. They represent instances with the same behaviour.
Individually, any component can be switched with other without noticeable change in the
operation of the system. Usually, homogeneous systems have a large number of
components, which communicate using operations of data exchange.
b) Heterogeneous systems are based on different components with specialised behavioural
rules and relations. Basically, the operation of the system relies on the differences between
components, and therefore, no component can be switched with another. In general,
heterogeneous systems are composed by fewer components than homogeneous systems,
communicating with function calls.

(The Communicating Sequential Elements pattern is used when the design problem at hand can be
understood in terms of a domain parallelism. The same operations are performed simultaneously on
different pieces of ordered data [3,11]. Operations in each component depend on partial results in
neighbour components.
The Manager-Workers pattern can be considered as a variant of the Master-Slave pattern [1] for
parallel systems, introducing an activity parallelism where the same operations are performed on
ordered data. Each components performs the same operations, independent of the processing
activity of other components. Different pieces of data are processed simultaneously.)

Modele multi-threading pentru tratarea cererilor;

perspectiva partiționării datelor și/sau algoritmilor

Punctul b) Arhitecturarea și concurența

 Definitia concurentei
Concurenta – e considerate o proprietate a sistemelor in care 2 sau mai multe procese se afla
simultan in curs de derulare si contextele lor de executie nu sunt disjuncte. Specificarea unei
arhitecturi concurente trebuie sa cuprinda descrierea cooperarii intre procese

 Avantajele si dezavantajele unei arhitecturi concurente


Avantaje Dezavantaje
Cresterea vitezei de executie Cheltuieli suplimentare
Un fir pentru GUI Complexitate
Acces parallel la BD-uri, web pagini Sincronizare
Izolarea unor algoritmi speciali in fire Comunicare
aparte
Se cedeaza executia firelor care dispun de Organizarea schimbului de executie
resurse efectiv

 Efectele laterale ale nedeterminismului (deadlock, livelock, starvation, race condition)


Deadlock - is a situation where a set of processes are blocked because each process is holding
a resource and waiting for another resource acquired by some other process.
Livelock - is a recursive condition where two or more threads keep repeating a particular
piece of code. Livelock occurs when one thread keeps responding to the other thread and the
other thread is also doing the same.(Tineti minte exemplul cu usa: cind 2 stau in fata la usa si
“Poftim tu iesi primul”, “Oi nu, tu primul”)
Starvation - is a problem encountered in concurrent computing where a process is
perpetually denied necessary resources to process its work. For example, suppose an object
provides a synchronized method that often takes a long time to return. If one thread invokes
this method frequently, other threads that also need frequent synchronized access to the same
object will often be blocked.
Race condition - occurs when two or more threads can access shared data and they try to
change it at the same time. Because the thread scheduling algorithm can swap between
threads at any time, you don't know the order in which the threads will attempt to access the
shared data. Therefore, the result of the change in data is dependent on the thread scheduling
algorithm, i.e. both threads are "racing" to access/change the data.

 Platforme pentru concurență. Implementarea firelor de execuție (în limbaje Java, C#)

 1:1 (kernel-level threading – eng.), în care firele create de utilizator corespund firelor planificate de
nucleu (implementat în Windows API, Native POSIX Thread Library, etc.);
 N:1 (user-level threading – eng.), în care firele create de aplicații se „execută” planificat în spațiul
utilizatorului pe un singur fir de nivelul nucleului (implementat în GNU Portable Threads);
 N:M (hybrid threading – eng.), un compromis dintre precedentele două modele (întru valorificarea
multiprocesării) în care librăriile sunt responsabile pentru planificarea „execuției” multiplelor fire de nivel
utilizator pe firele disponibile de nivel nucleu (implementat în Windows 7, Tera/Cray MTA, etc.);
8) Tehnologii pentru date distribuite semi-structurate: baze de date nerela ționale
a. Motivele aplicării bazelor de date nerelaționale
Motivele pentru crea bazelor de date NoSQL:

 Big Data: collect, store, organize, analyze, share.


 Scalability: scale up, vertically:
o Increasing server capacity
o Adding more cpu, ram
o Managing is hard
 Data format
 Manageability
De la inceput, prima a aparut bazele de date relationale , in care informatia era
asezate in randuri si coloane asociindu-le cate o cheie. Astfel, SQL devine cel mai rigid si
limitat sistem pentru a transla datele complexe, la fel ca date ne-structurate.
Aparenta NoSQL se face datorita fluxului mare de date care cresteau exponential din
partea utilizatorilor. Numele comun al acestui tip de organizare a datelor: “Not Only SQL”.
Model NoSQL utilizeaza un sistem distribuit de date, ad-hoc tip de organizare a datelor.
Problemele pentru RDBMS:

 ORM , nu lucreaza atit de bine .


 Scheme rigide
 Captarea cresterii datelor , greu.
 Replication.
Diferenta intre SQL si NoSQL:

 SQL este orientat spre tabele, NoSQL este orientat spre documente, colectii , graph-
uri.
 SQL este scalabil vertical , NoSQL este scalabil orizontal.
 SQL-schema predefinita, NoSQL-schema dinamica.
 SQL- bazat pe ACID, NoSQL-bazat pe BASE.
Avantajele NoSQL:

 Cheap, easy to implement


 Data are replicated and can be partitioned
 Easy to distribuite
 CAP
Dezavantajele:

 No guarantee support
 Too many options
 No standard language

b. Teorema CAP a lui Brewer


CAP(Consistency Availability Partition)- o baza de date distribuita nu poate simultan sa ofere
mai multe decat 2 sau 3 caracteristici.
Caracteristicile care le descrie Brewer si sustine ca nu pot fi intalnite simultan sunt:
 Consistency: datele sunt consistente, dupa update , toti utilizatorii trebuie sa vada acelasi
date. Doar consistency poate avea loc in acelas timp, fata de A si P.
 Availability: sistemul este tot timpul up.
 Partition: chiar daca comunicarea intre servere nu mai exista sistemul continua sa
funcioneze, deoare este asigurat cu baze de date de rezerva.poza

c. Principiile ACID versus BASE în perspectiva sistemelor distribuite

When it comes to NoSQL databases, data consistency models can sometimes be strikingly
different than those used by relational databases (as well as quite different from other
NoSQL stores).

 The key ACID guarantee is that it provides a safe environment in which to operate on
your data. The ACID acronym stands for:
Atomic - All operations in a transaction succeed or every operation is rolled back.
Either the entire transaction takes place at once or doesn’t happen at all.

Consistent – Ensures that only valid data following all rules and constraints is written
in the database. When a transaction results in invalid data, the database reverts to its
previous state

Isolation - Ensures that transactions are securely and independently processed at


the same time without interference, but it does not ensure the order of transactions.
In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the only transaction in the
system. No transaction will affect the existence of any other transaction.

Durability - The database should be durable enough to hold all its latest updates
even if the system fails or restarts. If a transaction updates a chunk of data in a
database and commits, then the database will hold the modified data. If a
transaction commits but the system fails before the data could be written on to
the disk, then that data will be updated once the system springs back into
action.daca

 BASE
Basically available - The system is guaranteed to be available for
querying by all users. (No isolation here.) Focuses on availability of data
even in the presence of multiple failures
Soft state -  The state of the system could change over time.
Unul dintre conceptele de bază din spatele BASE este că coerența datelor este problema
dezvoltatorului și nu ar trebui tratată de baza de date.
Eventual consistency - means that if no further updates are made to a given
updated database item for long enough period of time , all users will see the same
value for the updated item
 Singura cerință pe care bazele de date NoSQL o are în ceea ce privește coerența este aceea de
a solicita ca într-un anumit punct al viitorului, datele să se converge la o stare consecventă.
Cu toate acestea, nu se fac garanții cu privire la momentul când acest lucru se va întâmpla. 

BASE ACID
Cea mai importanta e disponibilitatea . Less availability
Best effort Strong consistency
Simple and fast Complex
Optimistic
Weaker consistency

d. Modele esențiale NoSql


i. Cheie/Valoare

Each unique identifier is stored as a key with its associated value. The
value can be any sort of byte array, data structure, or binary large object
(BLOB)
Example: Shopping carts: all the shopping info can be put into value
where key is the user id;
UserPreferences(language, color, timezone etc) as value can be put
into value with the key=userID

(REDIS, ORACLE NoSQL, Amazon Simple DB)


ii. Orientate pe coloane  is a database management system (DBMS) that
stores data tables by column rather than by row.

(Ex: Cassandra)
Column oriented databases are databases that organize data by field,
keeping all of the data associated with a field next to each other in
memory. 

iii. Orientate pe documente


(MongoDB, RavenDB, CouchDB). 

To use for:Event logging, E-commerce apps, Content management


systems
iv. Orientate pe grafuri
9) Servicii web
a) Protocolul HTTP: metode și antete
Safe methods are HTTP methods that do not modify resources(OPTIONS, GET, HEAD)

An idempotent HTTP method is a HTTP method that can be called many times without different
outcomes(OPTION, GET, HEAD, DELETE, PUT )

GET

The GET method requests a representation of the specified resource. Requests


using GET should only retrieve data.
HEAD

The HEAD method asks for a response identical to that of a GET request, but


without the response body.
POST

The POST method is used to submit an entity to the specified resource, often


causing a change in state or side effects on the server.
PUT

The PUT method replaces all current representations of the target resource


with the request payload.
DELETE

The DELETE method deletes the specified resource.


CONNECT

The CONNECT method establishes a tunnel to the server identified by the target


resource.
OPTIONS

The OPTIONS method is used to describe the communication options for the


target resource.
TRACE

The TRACE method performs a message loop-back test along the path to the


target resource.
PATCH

The PATCH method is used to apply partial modifications to a resource.


b)Stilul arhitectural REST
REST defines 6 architectural constraints which make any web service – a true RESTful
API.

a. client-server  The client and the server both have a different set of concerns.
The server stores and/or manipulates information and makes it available to the
user in an efficient manner. The client takes that information and displays it to the
user and/or uses it to perform subsequent requests for information. This
separation of concerns allows both the client and the server to evolve
independently as it only requires that the interface stays the same.
b. Stateless That means the communication between the client and the server
always contains all the information needed to perform the request. There is no
session state in the server, it is kept entirely on the client's side. If access to a
resource requires authentication, then the client needs to authenticate itself with
every request.
c. Cacheable The client, the server and any intermediary components can all cache
resources in order to improve performance.
d. uniform interface This simplifies the architecture, as all components follow the
same rules to speak to one another.
e. Layered system  Individual components cannot see beyond the immediate layer
with which they are interacting. This means that a client connecting to an
intermediate component, like a proxy, has no knowledge of what lies beyond.
This allows components to be independent and thus easily replaceable or
extendable.
f. provides code on demand

C)Caching and proxying


Web proxy caching enables you to store copies of frequently-accessed web objects
(such as documents, images, and articles) and then serve this information to users on
demand. It improves performance and frees up Internet bandwidth for other tasks.
Internet users direct their requests to web servers all over the Internet. A caching
server must act as a web proxy server so it can serve those requests. After a web proxy
server receives requests for web objects, it either serves the requests or forwards them to
the origin server (the web server that contains the original copy of the requested
information). The Traffic Server proxy supports explicit proxy caching, in which the user’s
client software must be configured to send requests directly to the Traffic Server proxy. The
following overview illustrates how Traffic Server serves a request.
Caching is typically more complex than the preceding overview suggests. In
particular, the overview does not discuss how Traffic Server ensures freshness, serves
correct HTTP alternates, and treats requests for objects that cannot/should not be cached.
The following sections discuss these issues in greater detail.
HTTP Object Freshness- check how long the object is present in the cache:
- checking the expires, max-age in header
- last-modified
Astfel asupra obiectelor se pot face urmatoarele actiuni :

 Revalidation
 Schedule updates
 Pushing
 Cache-control
Congestion control: The Congestion Control option enables you to configure Traffic
Server to stop forwarding HTTP requests to origin servers when they become congested.
Traffic Server then sends the client a message to retry the congested origin server later.
2. Set the variable proxy.config.http.congestion_control.enabled to 1
o Create rules in the congestion.config file to specify:
o which origin servers Traffic Server tracks for congestion
o the timeouts Traffic Server uses, depending on whether a server is congested
o the page Traffic Server sends to the client when a server becomes congested
o if Traffic Server tracks the origin servers per IP address or per hostname
3. Run the command traffic_line -x to apply the configuration changes.
 Caching in sistemele distribuite
Distributed caching refers to the ability in a distributed system to access data
from within the distributed system itself instead of relying on a separate system of record.
o Generally, the caching is performed in the main memory of the machines that
make up the distributed system; main memory is potentially augmented by
high performance storage, such as flash memory.
o Information is typically replicated, partitioned (a.k.a. sharded), invalidated, or
any combination thereof. Data is said to be replicated when a distributed
caching system proactively makes multiple copies of that data for achieving
some combination of availability and locality-of-reference. Data is said to be
partitioned when a distributed caching system allocates sub-sets of the data
to different machines and is able to subsequently route data requests for the
appropriate sub-sets to each corresponding machine; partitioning can be
static (e.g. memcache) or dynamic (e.g. Coherence). Data is said to be
invalidated when an action or event in a distributed system determines that all
copies of a cached piece of information should be discarded.
o Generally, a distributed cache is accessed by primary key (as in a key/value
store or document model), but some may also support being queried by other
criteria.
o Information that is owned by other systems of record is typically accessed via
a cache-aside model in which the application reads the data and then places
it into the cache, or via a cache-through model in which the cache itself is
responsible for communicating with the system of record.
Distributed cache tracks the modification timestamps of cache files, which notifies that
the files should not be modified until a job is executing currently.

 HTTP Caching
d) Cadre de programare ce facilitează dezvoltarea orientată pe web servicii

 Java/ JAX-RS - Cadrul Jersey


Jersey framework is more than the JAX-RS Reference Implementation. Jersey provides it’s
own API that extend the JAX-RS toolkit with additional features and utilities to further simplify
RESTful service and client development. Jersey also exposes numerous extension SPIs so that
developers may extend Jersey to best suit their needs.
Goals of Jersey project can be summarized in the following points:

i)Track the JAX-RS API and provide regular releases of production quality Reference
Implementations that ships with GlassFish;

ii)Provide APIs to extend Jersey & Build a community of users and developers; and finally

iii)Make it easy to build RESTful Web services utilising Java and the Java Virtual Machine.

 Net/ ASP.Net Web API (2)


ASP.NET Web API is a framework for building HTTP services that can be
accessed from any client including browsers and mobile devices. It is an
ideal platform for building RESTful applications on the .NET Framework.

A model is an object that represents the data in your application. ASP.NET Web API
can automatically serialize your model to JSON, XML, or some other format, and then
write the serialized data into the body of the HTTP response message.  Most clients
can parse either XML or JSON. Moreover, the client can indicate which format it
wants by setting the Accept header in the HTTP request message.

In Web API, a controller is an object that handles HTTP requests.

And from js the functions can be called :