NOTE INTRODUTTIVE AL SISTEMA SAS

Transcript

DIPARTIMENTO DI MATEMATICA
UNIVERSITÀ DI GENOVA
Via Dodecaneso, 35 - 16146 GENOVA (Italy)
tel. +39-010-3536751 fax +39-010-3536752
NOTE INTRODUTTIVE AL
SISTEMA SAS
Fabio Rapallo - Ivano Repetto - Maria Piera Rogantin
Dipartimento di Matematica - Università di Genova
INDICE
A.
Aspetti generali
A1. Il linguaggio SAS
A2. I passi di DATA
A3. I passi di PROC
A4. I DATA SET di tipo SAS
A5. Tipi di variabili
B.
Come eseguire un programma SAS
C.
Esempi di programmi SAS
C1. Primo esempio
C2. Secondo esempio: le procedure PRINT e CONTENTS
C3. Osservazioni per la scrittura dei programmi
D.
Il passo di DATA
D1. Creazione di un DATA SET SAS
D2. Alcuni esempi
D3. DATA SET SAS permanenti
E.
Manipolazione dei DATA SET
E1. Selezione di sottoclassi di osservazioni
E2. Selezione di osservazioni consecutive
E3. Selezione di variabili
E4. Cambio di nome a variabili
E5. Costruzione di più DATA SET SAS
E6. Concatenazione di più DATA SET SAS
E7. Lettura di DATA SET di "tipo" diverso
F.
Ancora sul passo di DATA
F1. Espressioni e funzioni SAS
F2. I valori mancanti
F3. Somme cumulate
F4. Approfondimenti sull’esecuzione di un passo di DATA
F5. Gli array
F6. Istruzioni di controllo
F7. L'istruzione INPUT
F8. L'istruzione INFILE
F9. L'istruzione OUTPUT
F10. Scrittura su un file esterno e istruzione PUT
G.
Il passo di PROC
G1. Alcune opzioni e istruzioni usato in un passo di PROC
G2. Procedura SORT
G3. Procedura PRINT
G4. Procedura MEANS
G5. Procedura FREQ (e procedura FORMAT)
G6. Procedura UNIVARIATE
G7. Altre procedure statistiche elementari
G8. Alcune procedure che operano su DATA SET SAS
G9. Selezione di variabili e di osservazioni in una procedura
H.
Istruzioni e procedure grafiche
H1. Alcune istruzioni per gli output grafici
H2. Procedura GCHART
H3. Procedura GPLOT
I.
Errori e lettura del LOG
3
J.
Approfondimenti: manipolazione di Data Set SAS
J1. Overview of Methods for Combining SAS Data Sets
J2. Manipolazione di Data Set SAS
J2.1 Per concatenare i DSS: uso di Set
J2.2 Per concatenare i DSS: uso di Set – by e di Merge – by
J2.3 Per affiancare DSS con variabili diverse: uso di Set – Set e di Merge
J2.4 Per aggiornare un DSS: uso di Update e di Merge – by
J2.5 Per aggingere osservazioni a un DSS: la proc Append
J3. Osservazioni ripetute: uso di Set – by e variabili first.<..> e last.<..>
K.
Approfondimenti: lettura di dati grezzi
K1. Input a lista con formato
K2. Input con nome
K3. Sospensione dell’input: uso di @
K4. Opzioni di Infile per leggere dati con delimitatori nell’input a lista
L.
Approfondimenti: formati di lettura e scrittura dei dati
L1. Istruzione Format
L2. Istruzione Informat
L3. Istruzione Lenght
L4. Istruzione Atttrib
L5. La Proc Format
L5.1 Istruzione Value
L5.2 Istruzione Invalue
L5.3 Istruzione Picture
L5.4 Alcuni esempi di cambio di formati
L5.5 Funzioni di conversione da variabile carattere a numerica e viceversa
L6. SAS Date, Time, and Datetime Values
L7. Alcune funzioni di arrotondamento
L8. Alcune funzioni sulle variabili carattere
M.
Approfondimenti: le Macro SAS
M1. Introduzione alla programmazione con macro
M2. SAS Macro Language: Reference
N.
Approfondimenti: come operare con matrici in SAS
4
A. Aspetti generali
A1. IL LINGUAGGIO SAS
Il SAS è un sistema software che fornisce strumenti necessari per analizzare dati. E' composto da:
- un linguaggio usato per la manipolazione dei dati;
- una libreria di procedure pre-confezionate per uso generale.
-
Esiste un modulo SAS BASE più vari moduli per particolari applicazioni quali ad esempio:
statistica (STAT)
controllo qualità (QC)
ricerca operativa (OR)
serie temporali (TSA)
manipolazione matrici (IML)
grafica avanzata (GRAPH)
gestione risorse calcolatore
gestione Data Base.
Il SAS consente di:
- leggere dati
- trasformare e manipolare di dati (mediante l'utilizzo di funzioni matematiche e statistiche, di
concatenazione e di ordinamento)
- aggiornare dati
- stampare prospetti
- generare grafici
- ridurre e sintetizzare dati
- effettuare analisi matematiche sui dati
Ogni programma è composto da passi (STEP).
Esistono 2 tipi di passi: i passi di DATA e i passi di PROC.
A2. I PASSI DI DATA
Si usano per creare DATA SET SAS partendo da files già esistenti. Tali files possono essere non
SAS (grezzi), o già di tipo SAS. I dati in ingresso possono subire trattamenti durante il passo di DATA.
dati grezzi
dati grezzi
passo di DATA
DATA
SET SAS
DATA
SET SAS
Un passo di data inizia con l'istruzione DATA seguito dal nome del Data Set Sas che si vuole
costruire.
A3. I PASSI DI PROC
Servono per produrre tabulati, rapporti, statistiche, ecc.
I dati su cui operano devono già essere in formato DATA SET SAS.
Con un passo di Proc si possono creare anche altri DATA SET SAS che possono essere analizzati
successivamente da altri passi di DATA o di PROC.
Un passo di proc inizia con l'istruzione PROC seguito dal nome della procedura che si vuole
eseguire.
Un passo di programma (di data o di proc) termina con l'istruzione
RUN;
o con un nuovo passo di data o di proc.
5
A4. I DATA SET DI TIPO SAS
ESEMPIO:
variabili
osservazioni
Un DATASET di tipo SAS è un insieme di
dati omogenei organizzati in forma
rettangolare.
Le colonne sono chiamate variabili; ciascuna di esse ha un nome (si consiglia di assegnare sempre
nomi mnemonici). Un nome deve seguire le seguenti regole sintattiche:
a) deve essere formato da 1 a 8 caratteri;
b) può contenere cifre al suo interno;
Un DATA SET SAS si compone di due parti distinte:
- una parte descrittiva in cui vengono memorizzate tutte le informazioni necessarie affinchè il SAS
possa in un qualunque momento rileggere i dati in modo totalmente automatico (es.: nomi e attributi
delle variabili, "storia" di come è stato costruito, ... );
- una parte in cui vengono memorizzati i dati propriamente detti.
Per visualizzare le due parti, si devono usare delle PROC diverse.
Osservazione: un DATA SET SAS non è un file di dati tradizionale ma è leggibile solo con il software
con cui è stato costruito.
Ogni file SAS ha un nome. I nomi dei file seguono le stesse regole dei nomi delle variabili.
A5. TIPI DI VARIABILI
Una variabile può essere di due tipi:
- NUMERICA (es. età, altezza, peso)
- CARATTERE (es. cognome, sesso)
Una variabile è caratterizzata da una serie di attributi:
- nome
- lunghezza (LENGTH)
- formato di ingresso (INFORMAT)
- formato di uscita (FORMAT)
- etichetta (LABEL)
che possono essere specificati con opportune istruzioni.
La lunghezza massima del nome di una variabile (salvo diversa dichiarazione) è di 8 caratteri .
Le variabili assumono valori che dipendono dalla elaborazione che si sta effettuando.
In particolari situazioni non esistono valori associabili ad una variabile (sia in fase di INPUT dei
dati, sia a causa di operazioni su dati "invalidi"); in questo caso il SAS associa un particolare valore alla
variabile, definito "valore mancante" o "valore vuoto" o "missing value".
Il SAS esegue automaticamente la conversione da variabile carattere a numerica quando:
- una variabile carattere è assegnata ad una variabile numerica definita precedentemente;
- si esegue il confronto tra una variabile carattere e una numerica;
- si eseguono operazioni aritmetiche su variabili carattere (solo nel caso in cui siano formate da cifre).
Il SAS esegue automaticamente una conversione da variabile numerica a carattere quando:
- una variabile numerica è assegnata ad una variabile carattere definita precedentemente;
- una funzione agisce su una variabile numerica ma ha un formato carattere come argomento;
- una variabile numerica è usata come operando di un operatore tipico delle variabili carattere (esempio
l'operatore di concatenazione di stringhe).
6
B. COME ESEGUIRE UN PROGRAMMA CON IL
SAS PER WINDOWS
Avviando il SAS da Windows compare una schermata composta generalmente da due finestre: una
di Log e l'altra di Program Editor. Un'altra finestra, quella di Output si apre quando il
programma crea un output non grafico
Alcune osservazioni sulla scrittura e l'esecuzione dei programmi:
- il testo del programma va scritto nella finestra di Program Editor
- per salvare il programma su disco: dalla finestra di Program Editor, dal menù File si seleziona:
Save o Save as
Con Save il programma (dopo il primo salvataggio) è salvato con il nome dell'ultimo programma
richiamato (fare attenzione). Il nome di un programma salvato compare nell'intestazione della finestra
di Program Editor.
- per richiamare un programma salvato in precedenza in un file: dalla finestra di Program Editor,
dal menù File si seleziona:
Open --> Read File
- per far eseguire programma occorre dalla finestra di P.E., dal menù Local selezionare:
Submit (che corrisponde al tasto funzionale F8)
- ad ogni esecuzione il testo del programma scompare dalla finestra del Program Editor, ma può
essere richiamato selezionando dalla finestra di Program Editor, dal menù Local:
Recall (corrispondente al tasto funzionale F4)
con Recall si richiama l'ultimo programma eseguito (se si ripete l'operazione due volte vengono
richiamati gli ultimi due programmi eseguiti, e così via)
- mentre il programma viene eseguito nella finestra di Log compaiono le indicazioni di ciò che il
programma sta facendo come, ad esempio, tempi di esecuzione delle procedure, eventuali errori,
numero di osservazioni lette nel Data Set, ecc.
- se è previsto un output non grafico questo viene scritto nella finestra di Output
- avendo le finestre un'estensione limitata, non sempre tutto il loro contenuto è visibile. Per scorrere
all'interno di una finestra si eseguono le solite operazioni delle applicazioni Windows
- per spostarsi da una finestra all'altra si può utilizare il menù Window o il mouse o usare i tasti
funzionali:
- F5 per la finestra di Program Editor
- F6 per la finestra di Log
- F7 per la finestra di Output
- i comandi di edizione di testo si trovano nel menu Edit
- per rimuovere il contenuto di tutte le linee di testo da qualsiasi finestra, dal menù Edit selezionare:
Clear text (corrispondente ai tasti control+e)
- per conservare i risultati contenuti nella finestra di Log e di Output in un file permanente si usa il
comando Save come per il salvataggio di un programma
- per conoscere il contenuto dei tasti funzionali bisogna, dal menù Help selezionare:
Keys
con tale operazione compare una finestra Keys con le indicazioni volute
- nel menù Help, selezionando
SAS System
si possono trovare la sintassi e le spiegazioni per le varie procedure e per l'uso dei comandi SAS
7
C. ESEMPI DI PROGRAMMA SAS
C1 PRIMO ESEMPIO
Con questo programma:
- si costruisce un data set SAS di nome CLASSE leggendo i dati inseriti nel programma
- si ordinano i dati secondo una variabile
- si stampa il contenuto del DSS costruito
- si costruiscono alcune statistiche
PROGRAMMA SAS n. 1:
DATA CLASSE;
INPUT NOME $ A_CORSO $
A_NASCIT ES_DATI MEDIA;
DATALINES;
XXX
1F
1965 12
95
ZZZ
4R
1966 13 100
WWW
4
1968 12 107
TTT 3 1967 9 100
;
PROC SORT data=classe;
BY ES_DATI; run;
PROC PRINT data=classe;
TITLE 'STUDENTI ORDINATI PER NUMERO
run;
PROC CONTENTS data=classe;
PROC MEANS data=classe;
RUN;
PROC MEANS data=classe;
var a_nascit;
RUN;
- i nomi delle variabili sono separati da blank
- le prime due variabili sono di tipo carattere
- i dati sono inseriti nel programma;
si vedrà in seguito il caso con dati su file
- ogni linea corrisponde a una osservazione
- lavora sull'ultimo Data Set
- ordina le oss. rispetto alla variabile ES_DATI
- stampa le variabili del Data Set con il titolo indicato
ESAMI DATI';
- determina l'esecuzione del passo di proc
- stampa le informazioni sul Data Set con il
titolo assegnato precedentemente
- calcola alcune statistiche su tutte le variabili
numeriche con il titolo precedente
- calcola alcune statistiche sulle variabili indicate
dopo l’istruzioneVAR
OUTPUT SAS della procedura PRINT:
STUDENTI ORDINATI PER NUMERO ESAMI DATI
OBS
NOME
A_CORSO
A_NASCITA
ES_DATI
MEDIA
1
2
3
4
TTT
XXX
WWW
ZZZ
3
1F
4
4R
1967
1965
1968
1966
9
12
12
13
100
95
107
100
8
C2 SECONDO ESEMPIO: LE PROCEDURE PRINT E CONTENTS
Con questo programma:
- si costruisce un data set SAS di nome ES1 leggendo i dati inseriti nel programma
- si costruiscono nuove variabili da quelle di partenza
- si stampa il contenuto del DSS costruito (sia i dati che la descrizione)
PROGRAMMA SAS n. 2:
data es1;
input sesso $ eta hinch wlib;
altezza=hinch*2.54;
peso=wlib*0.4536;
datalines;
f 14 56.3
f 15 62.3
f 15 63.3
f 16 59.0
f 19 62.5
f 17 62.5
f 18 59.0
f 14 56.5
f 16 62.0
f 14 53.8
f 13 61.5
f 17 61.5
f 15 64.5
f 14 58.3
f 14 51.3
f 14 58.8
f 19 65.3
f 15 59.5
f 14 61.3
f 18 63.3
f 14 61.8
85.0
105.0
108.0
92.0
112.5
112.0
104.0
69.0
94.5
68.5
104.0
103.5
123.5
93.0
50.5
89.0
107.0
78.5
115.0
114.0
85.0
(non è riportato una parte dell'INPUT)
m
m
m
m
m
m
m
m
m
m
m
m
m
m
16
15
19
16
15
17
15
15
17
14
16
18
16
16
56.8
64.8
64.5
58.0
62.8
63.8
57.8
57.3
63.5
55.0
66.5
65.0
61.5
62.0
75.0
128.0
98.0
84.0
99.0
112.0
79.5
80.5
102.5
76.0
112.0
114.0
140.0
107.5
;
run;
proc print data=es1;
title ' ';run;
proc contents data=es1;run;
i due primi run non sono necessari
9
OUTPUT SAS:
L'output della proc print è il seguente:
OBS
1
2
3
4
5
6
7
8
9
SESSO
ETA
HINCH
WLIB
ALTEZZA
PESO
f
f
f
f
f
f
f
f
f
14
15
15
16
19
17
18
14
16
56.3
62.3
63.3
59.0
62.5
62.5
59.0
56.5
62.0
85.0
105.0
108.0
92.0
112.5
112.0
104.0
69.0
94.5
143.002
158.242
160.782
149.860
158.750
158.750
149.860
143.510
157.480
38.5560
47.6280
48.9888
41.7312
51.0300
50.8032
47.1744
31.2984
42.8652
(non è riportato una parte dell'output)
230
231
232
233
234
235
236
m
m
m
m
m
m
m
15
17
14
16
18
16
16
57.3
63.5
55.0
66.5
65.0
61.5
62.0
80.5
102.5
76.0
112.0
114.0
140.0
107.5
145.542
161.290
139.700
168.910
165.100
156.210
157.480
36.5148
46.4940
34.4736
50.8032
51.7104
63.5040
48.7620
L'output della proc contents è il seguente:
CONTENTS PROCEDURE
Data Set Name:
Member Type:
Engine:
Created:
Last Modified:
Protection:
Data Set Type:
Label:
WORK.ES1
DATA
V611
10:55 Friday, December 4, 1998
10:55 Friday, December 4, 1998
Observations:
Variables:
Indexes:
Observation Length:
Deleted Observations:
Compressed:
Sorted:
-----Engine/Host Dependent Information----Data Set Page Size:
Number of Data Set Pages:
File Format:
First Data Page:
Max Obs per Page:
Obs in First Data Page:
8192
2
607
1
169
147
-----Alphabetic List of Variables and Attributes----#
Variable
Type
Len
Pos
-----------------------------------5
ALTEZZA
Num
8
32
2
ETA
Num
8
8
3
HINCH
Num
8
16
6
PESO
Num
8
40
1
SESSO
Char
8
0
4
WLIB
Num
8
24
10
236
6
0
48
0
NO
NO
C3. OSSERVAZIONI PER LA SCRITTURA DEI PROGRAMMI
- le istruzioni terminano con il carattere " ; "
- si possono usare tutte le colonne di una linea
- si possono scrivere più istruzioni su una linea (separate ovviamente da ;)
- si può scrivere una istruzione su più linee
- si possono mettere più istruzioni "RUN" all'interno di un programma
- i commenti vanno compresi fra /* e */ ( es.
PROC SORT;
/* ordinamento dei dati */ )
ABBREVIAZIONI PER LISTE DI VARIABILI
a) X1-Xn
si considerano tutte le variabili da X1 a Xn
b) X--A
si considerano tutte le variabili da X a A
( X1 X2 X3 ... Xn)
X-NUMERIC-A
si considerano tutte le variabili numeriche da X a A
X-CHARACTER-A
si considerano tutte le variabili carattere da X a A
si considerano tutte le variabili numeriche
c) _NUMERIC_
_CHARACTER_
si considerano tutte le variabili carattere
_ALL_
si considerano tutte le variabili
ESEMPIO 1:
l'istruzione INPUT dell'esempio n.1 può scriversi:
INPUT NOME $ A_CORSO $ VAR1-VAR3;
dopo PROC PRINT si potrebbe mettere l'istruzione:
VAR NOME--VAR3 ;
che sarebbe equivalente a:
VAR NOME A_CORSO VAR1 VAR2 VAR3;
(tale istruzione indica che la PROC deve essere effettuata solo per le variabili indicate)
ESEMPIO 2:
data uno;
input x1 x2 y x3 x5;
datalines;
1 2 3 4 5
6 7 8 9 0
;
proc print; var x1--x3; run;
proc print; var x1-x3; run;
OUTPUT SAS:
OUTPUT SAS:
Obs
x1
x2
1
2
1
6
2
7
y
3
8
x3
4
9
Obs
x1
x2
x3
1
2
1
6
2
7
4
9
proc print; var x1--x5; run;
proc print; var x1-x5; run;
OUTPUT SAS:
LOG SAS:
Obs
x1
x2
y
x3
x5
1
2
1
6
2
7
3
8
4
9
5
0
ERROR: Variable X4 in suffix list not in data set.
D. IL PASSO DI DATA
11
D1. CREAZIONE DI UN DATA SET SAS
DATI SU FILE ESTERNO:
DATA nome Data Set ;
INFILE '[path] nome ' ;
INPUT .........;
apre il file per la lettura
descrive l'input assegnando un nome alle variabili
con eventuale formato di lettura.
altre istruzioni usate nel passo di DATA ;
DATI INSERITI NEL PROGRAMMA:
DATA nome Data Set ;
INPUT .........;
altre istruzioni ;
DATALINES;
linee di dati
;
immediatamente prima dei dati
indica la fine dei dati
DATI DA UN ALTRO DATA SET:
DATA nome del nuovo Data Set ;
SET nome del Data Set da cui leggere i dati;
altre istruzioni ;
OSS: oltre all'istruzione SET si possono usare anche le istruzioni MERGE e PUT con risultato analogo.
12
D2. ALCUNI ESEMPI
DATI INSERITI NEL PROGRAMMA:
PROGRAMMA SAS n. 2:
data es1;
altezza=hinch*2.54;
peso=wlib*0.4536;
f 143
f 155
f 153
f 161
f 191
f 171
f 185
f 142
f 160
f 140
f 139
f 178
m 153
m 155
m 178
m 142
m 164
m 189
m 164
m 167
;
run;
datalines;
56.3 85.0
62.3 105.0
63.3 108.0
59.0 92.0
62.5 112.5
62.5 112.0
59.0 104.0
56.5 69.0
62.0 94.5
53.8 68.5
61.5 104.0
61.5 103.5
57.8 79.5
57.3 80.5
63.5 102.5
55.0 76.0
66.5 112.0
65.0 114.0
61.5 140.0
62.0 107.5
DATI SU FILE ESTERNO:
PROGRAMMA SAS n. 3:
data es2;
infile 'a:es1.txt';
altezza=hinch*2.54;
peso=wlib*0.4536;
run;
DATI DA UN ALTRO DATA SET:
PROGRAMMA SAS n. 4:
data es3;
set es2;
if eta < 16 then cl_eta = 'giovane';
else cl_eta='vecchio';
run;
proc print data=es3;
var eta cl_eta;
run;
13
D3. DATA SET SAS PERMANENTI
COME RENDERE PERMANENTE UN DATA SET SAS
a) bisogna creare una "libreria" con l'istruzione:
LIBNAME nome simbolico libreria ' path ';
si indica la directory dove scrivere i
Data Set permanenti
b) quando si costruisce il Data Set bisogna scrivere:
DATA nome simbolico libreria .nome Data Set ;
INPUT ......;
.............
l'estensione del file costruito è SD2
I Data Set sono conservati con il nome: nome Data Set .SD2 nel path specificato dall'istruzione
LIBNAME.
Esempio.
PROGRAMMA SAS n. 5:
libname corso 'a:\corsosas';
data corso.es3;
set es2;
if eta < 16 then cl_eta = 'giovane';
else cl_eta='vecchio';
run;
proc print data=corso.es3;
var eta cl_eta;
run;
I Data Set permanenti sono memorizzati nella directory a:\corsosas. L'istruzione libname
vale per tutti i Data Set costruiti nella sessione.
Il data set costruito è memorizzata nel file a:\corsosas\es3.sd2 che ha la struttura di un Data set
SAS.
COME ACCEDERE AD UN DATA SET PERMANENTE
DATA nome nuovo Data Set ;
SET nome simbolico libreria .nome Data Set ;
..........
data corso.nuovo;
set corso.es3;
14
E. MANIPOLAZIONE DI DATA SET
Consideriamo il seguente esempio.
PROGRAMMA SAS n. 6:
data corso.disney;
input nome $ & sesso $ eta altezza peso;
length nome $ 12;
datalines;
pippo m 32 190 54
paperino m 34 150 50
minnie f 35 145 40
clarabella f 30 180 65
nonna papera f 99 140 55
qui m 8 120 30
quo m 8 120 30
qua m 8 120 30
emy f 8 117 25
ely f 8 117 25
edy f 8 117 25
;
proc print data=corso.disney; run;
OUTPUT SAS:
OBS
1
2
3
4
5
6
7
8
9
10
11
NOME
pippo
paperino
minnie
clarabella
nonna papera
qui
quo
qua
emy
ely
edy
SESSO
ETA
ALTEZZA
PESO
m
m
f
f
f
m
m
m
f
f
f
32
34
35
30
99
8
8
8
8
8
8
190
150
145
180
140
120
120
120
117
117
117
54
50
40
65
55
30
30
30
25
25
25
E1. SELEZIONE DI SOTTOCLASSI DI OSSERVAZIONI
La selezione di sottoclassi di osservazioni contenute in un Data Set può avvenire con diverse forme.
Presentiamo alcune di queste possibilità continuando l'esempio precedente:
PROGRAMMA SAS n. 7:
data maschi;
il DDS costruito è temporaneo
set corso.disney;
nome DSS con i dati di cui si vuole selezionare una sottoclasse
if sesso='m';
precisa il criterio di selezione
proc print data=maschi; run;
l'istruzione:
if sesso='m';
può essere sostituita equivalentemente dalle istruzioni
15
if sesso ^='m' then delete;
if sesso ='m' then output;
OUTPUT SAS:
OBS
NOME
1
2
3
4
5
pippo
paperino
qui
quo
qua
SESSO
ETA
ALTEZZA
PESO
m
m
m
m
m
32
34
8
8
8
190
150
120
120
120
54
50
30
30
30
E2. SELEZIONE DI OSSERVAZIONI CONSECUTIVE
Si possono usare varie opzioni dell'istruzione SET, di cui presentiamo alcune possibilità (riferite
sempre all'esempio di partenza):
PROGRAMMA SAS n. 8:
data prime3;
set corso.disney(obs=3);
proc print data=prime3;
data dalla3;
set corso.disney(firstobs=3);
proc print data=dalla3;
data centrali;
set corso.disney(firstobs=3 obs=5);
proc print data=centrali;
run;
OUTPUT SAS:
OBS
1
2
3
OBS
1
2
3
4
5
6
7
8
9
OBS
1
2
3
NOME
pippo
paperino
minnie
NOME
minnie
clarabella
nonna papera
qui
quo
qua
emy
ely
edy
NOME
minnie
clarabella
nonna papera
SESSO
ETA
ALTEZZA
PESO
m
m
f
32
34
35
190
150
145
54
50
40
SESSO
ETA
ALTEZZA
PESO
f
f
f
m
m
m
f
f
f
35
30
99
8
8
8
8
8
8
145
180
140
120
120
120
117
117
117
40
65
55
30
30
30
25
25
25
SESSO
ETA
ALTEZZA
PESO
f
f
f
35
30
99
145
180
140
40
65
55
16
PROGRAMMA SAS n. 8 bis
data es2bis;
infile 'a:es1.txt' firstobs=3;
proc print data=es2bis;
OUTPUT SAS:
OBS
SESSO
ETA
HINCH
WLIB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
f
f
f
f
f
f
f
f
f
f
m
m
m
m
153
161
191
171
185
142
160
140
139
178
153
155
178
142
63.3
59.0
62.5
62.5
59.0
56.5
62.0
53.8
61.5
61.5
57.8
57.3
63.5
55.0
108.0
92.0
112.5
112.0
104.0
69.0
94.5
68.5
104.0
103.5
79.5
80.5
102.5
76.0
(è omessa una parte dell’output)
E3. SELEZIONE DI VARIABILI
La selezione delle variabili avviene mediante l'utilizzo delle istruzioni DROP e KEEP.
Queste istruzioni sono complementari e servono per specificare:
- quali variabili del vEcchio Data Set non si vogliono ricopiare nel nuovo (istruzione DROP).
- quali variabili del vecchio Data Set si vogliono ricopiare nel nuovo (istruzione KEEP).
Le istruzioni DROP e KEEP sono non eseguibili. Possono pertanto comparire in qualunque punto
di un passo di Data.
Esempio:
PROGRAMMA SAS n. 9:
data etasesso;
set corso.disney;
drop altezza peso;
..............
run;
oppure
keep nome eta sesso;
In tal caso il nuovo Data Set non contiene più le variabili ALTEZZA e PESO, però tali variabili
possono essere usate nelle istruzioni e nel calcolo di nuove variabili (ad es. rapporto=altezza/peso).
DROP e KEEP possono comparire anche come opzioni di un Data Set SAS di input, come segue:
PROGRAMMA SAS n. 10:
data etasesso;
set corso.disney(drop = altezza peso);
..............
run;
In tal caso le variabili ALTEZZA e PESO non possono essere usate in alcun modo nel nuovo Data Set.
17
E4. CAMBIO DI NOME A VARIABILI
È sufficiente usare l'istruzione RENAME come segue:
data nuovo(rename=(sesso=mf));
set corso.disney;
/* altre istruzioni */
run;
E5. COSTRUZIONE DI PIÙ DATA SET
Si usano le istruzioni IF e SELECT che permettono di effetuare delle scelte condizionate.
Esempio.
data corso.maschi corso.femmine;
set corso.disney;
if sesso='m' then output corso.maschi;
else if sesso='f' then output corso.femmine;
else put 'osservazioni sbagliate' _all_;
proc print data=corso.maschi;
proc print data=corso.femmine;
run;
oppure:
set corso.disney;
select(sesso);
when('m') output corso.maschi;
when('f') output corso.femmine;
otherwise put 'osservazioni sbagliate' _all_;
end;
proc print data=corso.maschi;
proc print data=corso.femmine;
run;
OUTPUT SAS: (in entrambi i casi)
OBS
1
2
3
4
5
OBS
1
2
3
4
5
6
NOME
pippo
paperino
qui
quo
qua
NOME
minnie
clarabella
nonna papera
emy
ely
edy
SESSO
ETA
ALTEZZA
PESO
m
m
m
m
m
32
34
8
8
8
190
150
120
120
120
54
50
30
30
30
SESSO
ETA
ALTEZZA
PESO
f
f
f
f
f
f
35
30
99
8
8
8
145
180
140
117
117
117
40
65
55
25
25
25
18
Se la variabile Sesso contenesse un valore diverso da ‘m’ o ‘f’, ad esempio ‘M’, nella finestra di Log si
avrebbe un messaggio come indicato nella istruzione put.
PROGRAMMA SAS n. 11 bis:
data errore;
if nome='paperino' then sesso='M';
set corso.disney;
set errore;
else put 'osservazioni sbagliate ' _all_;
run;
LOG SAS:
osservazioni sbagliate
nome=paperino sesso=M eta=34 altezza=150 peso=50 _ERROR_=0 _N_=2
E6. CONCATENAZIONE DI PIÙ DATA SET
Si usa ancora una volta l'istruzione SET, come nel seguente esempio in cui i DS hanno le stesse variabili:
data corso.tutti;
set corso.maschi corso.femmine;
proc print data=corso.tutti; run;
OUTPUT SAS della proc print:
OBS
1
2
3
4
5
6
7
8
9
10
11
NOME
SESSO
ETA
ALTEZZA
PESO
m
m
m
m
m
f
f
f
f
f
f
32
34
8
8
8
35
30
99
8
8
8
190
150
120
120
120
145
180
140
117
117
117
54
50
30
30
30
40
65
55
25
25
25
pippo
paperino
qui
quo
qua
minnie
clarabella
nonna papera
emy
ely
edy
L'istruzione Set usata nel seguente modo produrrebbe un output diverso:
data corso.tutti2;
set corso.maschi;
set corso.femmine;
proc print data=corso.tutti2; run;
OUTPUT SAS n. 14:
OBS
1
2
3
4
5
NOME
minnie
clarabella
nonna papera
emy
ely
SESSO
ETA
ALTEZZA
PESO
f
f
f
f
f
35
30
99
8
8
145
180
140
117
117
40
65
55
25
25
19
Il DSS TUTTI2 ha un numero di osservazioni uguale al minimo fra le osservazioni di MASCHI e
FEMMINE; inoltre – in questo caso in cui le variabili dei due DSS sono le stesse – il secondo DSS viene
scritto sul primo.
Le due istruzioni set si possono usare quando i DS hanno variabili diverse (ma rilevate sulla stessa
popolazione), come si vede nel seguente esempio. In questo caso i due DS risultano "affiancati".
data corso.maschi1;
set corso.maschi;
keep nome sesso eta;
data corso.maschi2;
set corso.maschi;
keep nome altezza peso;
data corso.maschi3;
set corso.maschi1;
set corso.maschi2;
proc print; run;
OUTPUT SAS:
OBS
1
2
3
4
5
NOME
pippo
paperino
qui
quo
qua
SESSO
ETA
ALTEZZA
PESO
m
m
m
m
m
32
34
8
8
8
190
150
120
120
120
54
50
30
30
30
Nella variabile NOME sono scritti i valori assunti nel secondo DS.
Se i Data Set hanno un diverso numero di osservazioni per ciascuna variabile, viene costruito un
nuovo Data Set contenente tutte le variabili dei Data Set precedenti, mettendo a missing le osservazioni
mancanti.
Si potrebbe ottenere un DS simile al precedente utilizzando l'istruzione merge nel seguente modo:
proc sort data=corso.maschi1 out=corso.maschi1s;
by nome;
proc sort data=corso.maschi2 out=corso.maschi2s;
by nome;
data corso.maschi3s;
merge corso.maschi1s corso.maschi2s;
by nome;
proc print data=corso.maschi1s; run;
OUTPUT SAS:
OBS
1
2
3
4
5
NOME
paperino
pippo
qua
qui
quo
SESSO
ETA
ALTEZZA
PESO
m
m
m
m
m
34
32
8
8
8
150
190
120
120
120
50
54
30
30
30
ALTRO ESEMPIO:
20
data uno;
input n $ x y;
datalines;
a 12 13
b 14 15
d 16 17
data due;
input n $ x z;
datalines;
a 22 23
b 24 25
c 26 27
;
;
data tre;
set uno;
set due;
proc print; run;
OUTPUT SAS:
Obs
1
2
n
a
b
x
22
24
y
13
15
z
23
25
3
c
26
17
27
OUTPUT SAS:
data quattro;
merge uno due;
by n;
proc print; run;
Obs
1
2
3
4
n
a
b
c
d
x
22
24
26
16
y
13
15
.
17
ATTENZIONE A QUESTA OSSERVAZIONE
z
23
25
27
.
E7. LETTURA DI DATI DA DATA SET DI "TIPO" DIVERSO
Consideriamo il seguente esempio:
proc means data=corso.disney;
var altezza peso;
output out=sommario mean=m_alt m_peso;
legge dal Data Set corso.disney
opera solo sulle variabili altezza e peso
dà alle due medie i nomi e costruisce il Data Set
proc print data=sommario;
run;
OBS
_TYPE_
_FREQ_
M_ALT
M_PESO
1
0
11
137.818
39
Se si vuole costruire un Data Set con gli scarti dalle medie bisogna operare nel seguente modo.
data corso.diney1;
if _n_=1 then set sommario;
set corso.disney;
alt_c=altezza-m_alt;
peso_c=peso-m_peso;
drop _type_ _freq_;
proc print; run;
In tal modo si costruisce un Data Set con le variabili precedenti più le due medie m_alt e m_peso e
le due nuove che sono alt_c e peso_c.
OUTPUT SAS:
21
0BS
M_ALT
M_PESO
1
2
3
4
5
6
7
8
9
10
11
137.818
137.818
137.818
137.818
137.818
137.818
137.818
137.818
137.818
137.818
137.818
39
39
39
39
39
39
39
39
39
39
39
NOME
SESSO
ETA
ALTEZZA
PESO
m
m
f
f
f
m
m
m
f
f
f
32
34
35
30
99
8
8
8
8
8
8
190
150
145
180
140
120
120
120
117
117
117
54
50
40
65
55
30
30
30
25
25
25
pippo
paperino
minnie
clarabella
nonna papera
qui
quo
qua
emy
ely
edy
ALT_C
52.1818
12.1818
7.1818
42.1818
2.1818
-17.8182
-17.8182
-17.8182
-20.8182
-20.8182
-20.8182
PESO_C
15
11
1
26
16
-9
-9
-9
-14
-14
-14
Se non si mettesse l'istruzione if _n_=1 then ... verrebbe costruito un Data Set con tutte
le variabili e un numero di osservazioni pari a quelle di sommario (primo Ds a cui si fa il set).
OUTPUT SAS:
0BS
M_ALT
M_PESO
1
137.818
39
NOME
SESSO
pippo
m
ETA
ALTEZZA
PESO
ALT_C
PESO_C
32
190
54
52.1818
15
22
F. ANCORA SUL PASSO DI DATA
F1. ESPRESSIONI E FUNZIONI SAS
ESPRESSIONI SAS
Sono le solite: costanti, date, operatori sia su variabili carattere che numeriche, ecc.
FUNZIONI SAS
Le funzioni del SAS, così come in tutti gli altri linguaggi di programmazione, sono dei programmi
già scritti che si richiamano con una parola chiave e ritornano un valore calcolato sugli argomenti che
vengono passati alla funzione.
Il formato delle funzioni può essere uno dei seguenti:
NOME-FUNZIONE (arg1 , arg2 , ... , argn );
NOME-FUNZIONE (OF var1 - varn );
NOME-FUNZIONE (OF var1 var2 var3......varn );
(il più comune è il primo tipo presentato)
Gli argomenti delle funzioni possono essere:
- costanti o variabili numeriche
- costanti o variabili alfanumeriche
- espressioni comprese quelle in cui compaiono altre funzioni
Le funzioni SAS si possono distinguere nelle seguenti classi:
- funzioni aritmetiche
ricordiamo ABS, MIN, MAX, DIM (indica la dimensione di un'array) HBOUND, LBOUND
(forniscono i limiti di un'array), ecc.
- funzioni di troncamento
- funzioni matematiche
ricordiamo EXP, LOG, GAMMA (funzione Gamma completa), LGAMMA (log. nat. della funz.
Gamma).
- funzioni trigonometriche
- funzioni probabilistiche
valore integrale
valore quantile
POISSON
PROBBETA
BETAINV
PROBBNML
PROBCHI
CINV
PROBF
FINV
PROBGAM
GAMINV
PROBNORM
PROBIT
PROBT
TINV
(con opportuni parametri)
- funzioni statistiche
ricordiamo MIN (minimo), MAX (massimo), MEAN (media), N (numero di dati non missing),
NMISS (numero di dati missing), RANGE (rango), STD (standard deviation), SUM (somma), VAR
(varianza), USS (somma dei quadrati dei dati), CSS (somma dei quadrati dei dati centrati sulla media).
- funzioni per generare numeri casuali
ricordiamo NORMAL (che genera una variabile normale), RANBIN (che genera un'osservazione da
una binomiale), RANEXP (che genera una osservazione da un'esponenziale di parametro 1),
RAGGAMM, ecc.
- funzioni per elaborazione di stringhe
- funzioni per elaborare date e tempi
- funzioni di sistema
- funzioni speciali
23
DIFFERENZA TRA FUNZIONI E PROCEDURE
Le funzioni producono statistiche per ogni osservazione (riga) nel Data Set SAS e producono
risultati pari al numero di osservazioni.
Le procedure producono statistiche per le variabili (colonne) nel Data Set SAS.
data temperature;
input citta $ t6 t12 t18;
media_temp=mean(t6,t12,t18);
istruzioni equivalenti
media_temp=mean(of t6 t12 t18);
media_temp=mean(of t6—-t18);
datalines;
Genova 19 24 22
Milano 15 18 18
Napoli 26 30 29
;
run;
OUTPUT SAS
proc print;
run;
proc means;
run;
Obs
citta
t8
t12
t18
media_temp
1
2
3
Genova
Milano
Napoli
19
15
26
24
18
30
22
18
29
21.6667
17.0000
28.3333
The MEANS Procedure
Variable
N
Mean
Std Dev
Minimum
Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
t8
3
20.0000000
5.5677644
15.0000000
26.0000000
t12
3
24.0000000
6.0000000
18.0000000
30.0000000
t18
3
23.0000000
5.5677644
18.0000000
29.0000000
media_temp
3
22.3333333
5.6960025
17.0000000
28.3333333
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
F2. VALORI MANCANTI
Il valore di una variabile viene messo a missing se il campo di input è blank oppure è un punto
(salvo diversa specificazione di formato).
I valori mancanti si propagano nelle espressioni aritmetiche; nelle funzioni, invece, il discorso
cambia.
I valori mancanti nei confronti vengono messi a "meno infinito".
Esempio.
data es5;
input dato1 dato2;
somma=dato1+dato2;
totale=sum(dato1,dato2);
media1=(dato1+dato2)/2;
media2=mean(dato1,dato2);
datalines;
1 3
6 4
. 78
8 1
12 14
;
proc print;run;
24
OUTPUT SAS:
Obs
dato1
1
2
3
4
5
Osservare che:
. + 78
sum( . , 78)
1
6
.
8
12
dato2
3
4
78
1
14
somma
totale
media1
media2
4
10
78
9
26
2.0
5.0
.
4.5
13.0
2.0
5.0
78.0
4.5
13.0
4
10
.
9
26
ha come risultato
.
ha come risultato 78
In generale le funzioni "ignorano" i valori missing; con SUM i missing sono considerati 0, con MEAN
viene fatta la somma dei valori non missing e il risultato viene diviso per il numero dei valori non
missing, ... .
F3. LE SOMME CUMULATE
Istruzione RETAIN
E' una istruzione non eseguibile e quindi può essere messa in qualunque punto del passo di DATA;
svolge le due seguenti funzioni:
- trattiene i valori delle variabili dalla precedente esecuzione del passo di Data
- assegna dei valori iniziali alle variabili.
La sintassi dell'istruzione RETAIN è la seguente:
RETAIN variabili valori iniziali;
Esempio.
OUTPUT SAS:
DATA ADD;
RETAIN TOTALE 0;
INPUT PUNTEGGI;
TOTALE=TOTALE+PUNTEGGI;
DATALINES;
10
3
7
5
;
PROC PRINT;RUN;
OBS
1
2
3
4
OUTPUT SAS:
DATA ADD2;
SET ADD;
RETAIN CONTO 0;
CONTO=CONTO+3;
PROC PRINT;RUN;
OBS CONTO
1
3
2
6
3
9
4
12
25
TOTALE
10
13
20
25
PUNTEGGI
10
3
7
5
TOTALE PUNTEGGI
10
10
13
3
20
7
25
5
Istruzione somma
Per effettuare somme cumulate, come nell’esempio precedente per la variabile CONTO, si può usare la
seguente espressione sintetica:
CONTO + 3;
Che corrisponde alle istruzioni
Retain CONTO 0;
CONTO=sum(CONTO, 3);
La sua sintassi generale è:
variabile + espressione;
Alcuni esempi che illustrano possibili espressioni sono:
bilancia + (- debito);
somma2 + x*x;
nx + (x ne .);
Con l’ultima istruzione si contano i valori non missing; infatti l’espressione logica (x ne .)vale 1 se è
veririficata e 0 altrimenti.
Retain e valori mancanti
DATA ADD;
RETAIN CONTO TOTALE 0;
INPUT PUNTEGGI;
TOTALE=TOTALE+PUNTEGGI;
CONTO=CONTO+3;
DATALINES;
10
3
7
.
(missing value)
6
4
;
PROC PRINT;RUN;
OUTPUT SAS:
OBS
1
2
3
4
5
6
CONTO
3
6
9
12
15
18
TOTALE
10
13
20
.
.
.
PUNTEGGI
10
3
7
.
6
4
Se si volesse avere TOTALE con valore anche per le osservazioni 4, 5 e 6 si dovrebbe fare:
TOTALE = SUM (TOTALE, PUNTEGGI);
Se si omettesse l'istruzione RETAIN le variabili TOTALE e PUNTEGGI avrebbero solo valori
missing.
26
F4. APPROFONDIMENTI SUL PASSO DI DATA (tratto da Help on line SAS)
Flow of Action
When you submit a DATA step for execution, it is first compiled and then executed. The following figure
shows the flow of action for a typical SAS DATA step.
Flow of Action in the DATA Step
27
The Compilation Phase
When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and
compiles them, that is, automatically translates the statements into machine code. In this phase, SAS
identifies the type and length of each new variable, and determines whether a type conversion is
necessary for each subsequent reference to a variable. During the compile phase, SAS creates the
following three items:
input buffer
is a logical area in memory into which SAS reads each record of raw data when SAS
executes an INPUT statement. Note that this buffer is created only when the DATA step
reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly
into the program data vector.)
program data is a logical area in memory where SAS builds a data set, one observation at a time.
vector (PDV) When a program executes, SAS reads data values from the input buffer or creates them
by executing SAS language statements. The data values are assigned to the appropriate
variables in the program data vector. From here, SAS writes the values to a SAS data set
as a single observation.
Along with data set variables and computed variables, the PDV contains two automatic
variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA
step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused
by the data during execution. The value of _ERROR_ is either 0 (indicating no errors
exist), or 1 (indicating that one or more errors have occurred). SAS does not write these
variables to the output data set.
descriptor
information
is information that SAS creates and maintains about each SAS data set, including data
set attributes and variable attributes. It contains, for example, the name of the data set
and its member type, the date and time that the data set was created, and the number,
names and data types (character or numeric) of the variables.
The Execution Phase
By default, a simple DATA step iterates once for each observation that is being created. The flow of
action in the Execution Phase of a simple DATA step is described as follows:
1. The DATA step begins with a DATA statement. Each time the DATA statement executes, a new
iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.
2. SAS sets the newly created program variables to missing in the program data vector (PDV).
3. SAS reads a data record from a raw data file into the input buffer, or it reads an observation from
a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET,
MODIFY, or UPDATE statement to read a record.
4. SAS executes any subsequent programming statements for the current record.
5. At the end of the statements, an output, return, and reset occur automatically. SAS writes an
observation to the SAS data set, the system automatically returns to the top of the DATA step, and
the values of variables created by INPUT and assignment statements are reset to missing in the
program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or
UPDATE statement are not reset to missing here.
6. SAS counts another iteration, reads the next record or observation, and executes the subsequent
programming statements for the current observation.
7. The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file.
Note: The figure shows the default processing of the DATA step. You can code data-reading statements
(such as INPUT or SET), or data-writing statements (such as OUTPUT), in any order in your program.
28
Processing a DATA Step: A Walkthrough
Sample DATA Step
The following statements provide an example of a DATA step that reads raw data, calculates totals, and
creates a data set:
data total_points (drop=TeamName); [1]
input TeamName $ ParticipantName $ Event1 Event2 Event3;
TeamTotal + (Event1 + Event2 + Event3);
[3]
datalines;
Knights Sue
6 8 8
Cardinals Jane 9 7 8
Knights John
7 7 7
Knights Lisa
8 9 9
Knights Fran
7 6 6
Knights Walter 9 8 10
;
[2]
The DROP= data set option prevents the variable TeamName from being written to the output SAS
data set called TOTAL_POINTS.
The INPUT statement describes the data by giving a name to each variable, identifying its data type
(character or numeric), and identifying its relative location in the data record.
The Sum statement accumulates the scores for three events in the variable TeamTotal.
Creating the Input Buffer and the Program Data Vector
When DATA step statements are compiled, SAS determines whether to create an input buffer. If the input
file contains raw data (as in the example above), SAS creates an input buffer to hold the data before
moving the data to the program data vector (PDV). (If the input file is a SAS data set, however, SAS does
not create an input buffer. SAS writes the input data directly to the PDV.)
The PDV contains all the variables in the input data set, the variables created in DATA step statements,
and the two variables, _N_ and _ERROR_, that are automatically generated for every DATA step. The
_N_ variable represents the number of times the DATA step has iterated. The _ERROR_ variable acts
like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist.
The following figure shows the Input Buffer and the program data vector after DATA step compilation.
Input Buffer and Program Data Vector
Variables that are created by the INPUT and the Sum statements (TeamName, ParticipantName, Event1,
Event2, Event3, and TeamTotal) are set to missing initially. Note that in this representation, numeric
variables are initialized with a period and character variables are initialized with blanks. The automatic
variable _N_ is set to 1; the automatic variable _ERROR_ is set to 0.
The variable TeamName is marked Drop in the PDV because of the DROP= data set option in the DATA
statement. Dropped variables are not written to the SAS data set. The _N_ and _ERROR_ variables are
dropped because automatic variables created by the DATA step are not written to a SAS data set. See
SAS Variables for details about automatic variables.
29
Reading a Record
SAS reads the first data line into the input buffer. The input pointer, which SAS uses to keep its place as
it reads data from the input buffer, is positioned at the beginning of the buffer, ready to read the data
record. The following figure shows the position of the input pointer in the input buffer before SAS reads
the data.
Position of the Pointer in the Input Buffer Before SAS Reads Data
The INPUT statement then reads data values from the record in the input buffer and writes them to the
PDV where they become variable values. The following figure shows both the position of the pointer in
the input buffer, and the values in the PDV after SAS reads the first record.
Values from the First Record are Read into the Program Data Vector
After the INPUT statement reads a value for each variable, SAS executes the Sum statement. SAS
computes a value for the variable TeamTotal and writes it to the PDV. The following figure shows the
PDV with all of its values before SAS writes the observation to the data set.
Program Data Vector with Computed Value of the Sum Statement
Writing an Observation to the SAS Data Set
When SAS executes the last statement in the DATA step, all values in the PDV, except those marked to
be dropped, are written as a single observation to the data set TOTAL_POINTS. The following figure
shows the first observation in the TOTAL_POINTS data set.
The First Observation in Data Set TOTAL_POINTS
SAS then returns to the DATA statement to begin the next iteration. SAS resets the values in the PDV in
the following way:
• The values of variables created by the INPUT statement are set to missing.
• The value created by the Sum statement is automatically retained.
• The value of the automatic variable _N_ is incremented by 1, and the value of _ERROR_ is reset
to 0.
The following figure shows the current values in the PDV.
30
Current Values in the Program Data Vector
Reading the Next Record
SAS reads the next record into the input buffer. The INPUT statement reads the data values from the
input buffer and writes them to the PDV. The Sum statement adds the values of Event1, Event2, and
Event3 to TeamTotal. The value of 2 for variable _N_ indicates that SAS is beginning the second
iteration of the DATA step. The following figure shows the input buffer, the PDV for the second record,
and the SAS data set with the first two observations.
Input Buffer, Program Data Vector, and First Two Observations
As SAS continues to read records, the value in TeamTotal grows larger as more participant scores are
added to the variable. _N_ is incremented at the beginning of each iteration of the DATA step. This
process continues until SAS reaches the end of the input file.
When the DATA Step Finishes Executing
The DATA step stops executing after it processes the last input record. You can use PROC PRINT to
print the output in the TOTAL_POINTS data set:
Output from the Walkthrough DATA Step
Total Team Scores
1
Obs
Participant
Name
Event1
Event2
Event3
Team
Total
1
2
3
4
5
6
Sue
Jane
John
Lisa
Fran
Walter
6
9
7
8
7
9
8
7
7
9
6
8
8
8
7
9
6
10
22
46
67
93
112
139
31
F5. GLI ARRAY (tratto da Help on line SAS)
Syntax
ARRAY array-name { subscript } <$><length> <array-elements> <(initial-value-list)>;
Arguments
array-name
names the array.
{subscript}
describes the number and arrangement of elements in the array by using an asterisk, a number, or
a range of numbers. Subscript has one of these forms:
{dimension-size(s)}
indicates the number of elements in each dimension of the array. Dimension-size is a numeric
representation of either the number of elements in a one-dimensional array or the number of
elements in each dimension of a multidimensional array.
$
indicates that the elements in the array are character element.
length
specifies the length of elements in the array that have not been previously assigned a length.
array-elements
names the elements that make up the array. Array-elements must be either all numeric or all
character, and they can be listed in any order. The elements can be
variables
lists variable names. (initial-value-list)
gives initial values for the corresponding elements in the array. The values for elements can be
numbers or character strings. You must enclose all character strings in quotation marks. To
specify one or more initial values directly, use the following format:
(initial-value(s))
To specify an iteration factor and nested sublists for the initial values, use the following format:
<constant-iter-value*> <(>constant value | constant-sublist<)>
Examples
Example 1: Defining Arrays
•
•
•
•
•
•
array
array
array
array
array
array
rain {5} janr febr marr aprr mayr;
days{7} d1-d7;
month{*} jan feb jul oct nov;
x{*} _NUMERIC_;
qbx{10};
meal{3};
Example 2: Assigning Initial Numeric Values
•
•
•
array test{4} t1 t2 t3 t4 (90 80 70 70);
array test{4} t1-t4 (90 80 2*70);
array test{4} _TEMPORARY_ (90 80 70 70);
Example 3: Defining Initial Character Values
•
array test2{*} a1 a2 a3 ('a','b','c');
32
Example 5: Using Iterative DO-Loop Processing
In this example, the statements process each element of the array, using the value of variable I as the
subscript on the array references for each iteration of the DO loop. If an array element has a value of 99,
the IF-THEN statement changes that value to 100.
array days{7} d1-d7;
do i=1 to 7;
if days{i}=99 then days{i}=100;
end;
Example 6: Referencing Many Arrays in One Statement
You can refer to more than one array in a single SAS statement. In this example, you create two arrays,
DAYS and HOURS. The statements inside the DO loop substitute the current value of variable I to
reference each array element in both arrays.
array hours{7} h1-h7;
do i=1 to 7;
if days{i}=99 then days{i}=100;
hours{i}=days{i}*24;
end;
Example 8: Using the Asterisk References as a Variable List
•
•
•
array cost{10} cost1-cost10;
totcost=sum(of cost {*});
input days {*};
array hours{7} h1-h7;
put hours {*};
ATTENZIONE: la struttura di array non viene conservata nel Data Set ma è
utilizzabile solo nel passo di Data nel quale l’array è costruito.
F6. ISTRUZIONI DI CONTROLLO
1)
IF espressione THEN istruzione ;
ELSE istruzione ;
(espressione=1 se vera, espressione=0 se falsa)
- se devono essere eseguite più istruzioni:
IF espressione THEN DO;
serie di istruzioni ;
END;
ELSE DO;
serie istruzioni;
END;
- l'istruzione
è equivalente all'espressione
IF espressione ;
IF (¬ espressione ) THEN DELETE;
33
ESEMPIO. PROGRAMMA SAS n 21:
set corso.disney;
else put 'osservazioni sbagliate ' _all_;
run;
2)
SELECT (variabile ) ;
WHEN (espressione ) istruzione ;
OTHERWISE istruzione ;
END;
espressione che valuta un singolo valore
ESEMPIO. PROGRAMMA SAS n.21 bis:
set corso.disney;
select(sesso);
when('m') output corso.maschi;
when('f') output corso.femmine;
otherwise put 'osservazioni sbagliate' _all_;
end;
run;
Attenzione: SELECT si può usare solo per valori esatti, altrimenti usare IF
ESEMPIO. PROGRAMMA SAS n.21 ter:
data piccoli grandi;
set corso.disney;
if altezza <150 then output piccoli;
else output grandi;
proc print data=piccoli; proc print data=grandi; run;
data piccoli grandi;
set corso.disney;
select(altezza);
when(<150) output piccoli;
otherwise output grandi;
end;
run;
/*guardare il log: non funziona!!! Con when ci vuole un valore esatto*/
3)
DO variabile indice = val. iniziale
[TO val. finale [BY incremento ] [WHILE (espressione )] [UNTIL(espressione )] ;
istruzioni ;
END;
dove: - la variabile indice può anche essere di tipo carattere;
- val.iniziale e val.finale possono essere sostituiti da una serie di valori separati da ","
attenzione: nel Data Set viene scritto solo il valore che le variabili (create o modificate nelle
istruzioni che compaiono "dentro il DO") hanno alla fine dell'esecuzione dell'istruzione
DO; se si vuole conservare il valore delle variabili a ogni passo del DO è necessario
scrivere "dentro il DO" l'istruzione OUTPUT;
34
ESEMPIO. PROGRAMMA SAS n. 22:
data es4;
do i=-1 to 1 by .1;
x=probnorm(i);
output;
end;
proc print; run;
OUTPUT SAS:
OBS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
I
-1.00000
-0.90000
-0.80000
-0.70000
-0.60000
-0.50000
-0.40000
-0.30000
-0.20000
-0.10000
-0.00000
0.10000
0.20000
0.30000
0.40000
0.50000
0.60000
0.70000
0.80000
0.90000
1.00000
4)
DO WHILE (espressione )
5)
DO UNTIL (espressione )
X
0.15866
0.18406
0.21186
0.24196
0.27425
0.30854
0.34458
0.38209
0.42074
0.46017
0.50000
0.53983
0.57926
0.61791
0.65542
0.69146
0.72575
0.75804
0.78814
0.81594
0.84134
per le istruzioni DO WHILE e DO UNTIL valgono le stesse considerazioni fatte per l'istruzione DO a
proposito dell'istruzione OUTPUT
6)
GOTO etichetta ;
7)
LINK etichetta ;
le etichette si dichiarano come ET:....
35
ESEMPI DI USO DEL CICLO DO
COSTRUZIONE DATA SET ORIGINALE
data uno;
input x y a$;
datalines;
5 7 n
3 2 a
4 7 a
2 1 n
5 5 n
;
data due;
set uno;
do i=5;
z=x*i;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
5
3
4
2
5
7
2
7
1
5
n
a
a
n
n
5
5
5
5
5
25
15
20
10
25
data tre;
set uno;
do i=5,7;
z=x*i;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
5
3
4
2
5
7
2
7
1
5
n
a
a
n
n
7
7
7
7
7
35
21
28
14
35
data quattro;
set uno;
do i=5,7;
z=x*i;
output;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
6
7
8
9
10
5
5
3
3
4
4
2
2
5
5
7
7
2
2
7
7
1
1
5
5
n
n
a
a
a
a
n
n
n
n
5
7
5
7
5
7
5
7
5
7
25
35
15
21
20
28
10
14
25
35
data cinque;
set uno;
do i=y;
z=x*i;
output;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
5
3
4
2
5
7
2
7
1
5
n
a
a
n
n
7
2
7
1
5
35
6
28
2
25
36
data sei;
set uno;
do i=y, x;
z=x*i;
output;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
6
7
8
9
10
5
5
3
3
4
4
2
2
5
5
7
7
2
2
7
7
1
1
5
5
n
n
a
a
a
a
n
n
n
n
7
5
2
3
7
4
1
2
5
5
35
25
6
9
28
16
2
4
25
25
data sette;
set uno;
do i=y, x;
z=x*i;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
5
3
4
2
5
7
2
7
1
5
n
a
a
n
n
5
3
4
2
5
25
9
16
4
25
data otto;
set uno;
do i=1 to 7 by 2;
z=x*i;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
5
3
4
2
5
7
2
7
1
5
n
a
a
n
n
9
9
9
9
9
35
21
28
14
35
data nove;
set uno;
do i=1 to 7 by 2;
z=x*i;
output;
end;
proc print;run;
Obs
x
y
a
i
z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
5
5
5
5
3
3
3
3
4
4
4
4
2
2
2
2
5
5
5
5
7
7
7
7
2
2
2
2
7
7
7
7
1
1
1
1
5
5
5
5
n
n
n
n
a
a
a
a
a
a
a
a
n
n
n
n
n
n
n
n
1
3
5
7
1
3
5
7
1
3
5
7
1
3
5
7
1
3
5
7
5
15
25
35
3
9
15
21
4
12
20
28
2
6
10
14
5
15
25
35
37
data dieci;
set uno;
do i='r','s','t';
z=trim(a)||i;
output;
end;
proc print;run;
data undici;
set uno;
n=0;
do until(n>5);
n+1;
output;
end;
proc print;run;
ÎÎÎÎÎÎ
data dodici;
n=0;
do until (n>5);
n+1;
output;
end;
proc print;run;
Obs
n
1
2
3
4
5
6
1
2
3
4
5
6
data tredici;
n=0;
do while (n<=5);
n+1;
output;
end;
proc print;run;
38
Obs
x
y
a
i
z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
5
5
5
3
3
3
4
4
4
2
2
2
5
5
5
7
7
7
2
2
2
7
7
7
1
1
1
5
5
5
n
n
n
a
a
a
a
a
a
n
n
n
n
n
n
r
s
t
r
s
t
r
s
t
r
s
t
r
s
t
nr
ns
nt
ar
as
at
ar
as
at
nr
ns
nt
nr
ns
nt
Obs
x
y
a
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
5
5
5
5
5
5
3
3
3
3
3
3
4
4
4
4
4
4
2
2
2
2
2
2
5
5
5
5
5
5
7
7
7
7
7
7
2
2
2
2
2
2
7
7
7
7
7
7
1
1
1
1
1
1
5
5
5
5
5
5
n
n
n
n
n
n
a
a
a
a
a
a
a
a
a
a
a
a
n
n
n
n
n
n
n
n
n
n
n
n
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
Obs
n
1
2
3
4
5
6
1
2
3
4
5
6
F7. L' ISTRUZIONE INPUT
L'istruzione INPUT si leggono dati grezzi (non Data Set SAS) su file, residenti su disco, ecc.
Non ci sono limiti al numero di istruzioni INPUT che possono comparire in un passo di Data.
Ciascuna istruzione INPUT legge:
- da un file indirizzato da una istruzione INFILE che deve precedere la istruzione INPUT
- da programma se non compare alcuna istruzione INFILE. In tal caso per segnalare al SAS l'inizio dei
dati, deve comparire una istruzione DATALINES (o CARDS) come ultima istruzione del passo di Data
ed immediatamente prima dell'inizio dei dati.
In SAS sono possibili vari tipi di input, di cui i principali sono:
1) a colonne
2) a lista
3) a formato
E' possibile nella stessa istruzione combinare i tre tipi di input.
INPUT A COLONNE:
La sintassi della istruzione e':
INPUT var_1 [$] col_partenza_1 [ - col_fine_1 ] [ . decimali_1 ] ....
[ var_n [$] col_partenza_n [ - col_fine_N ] [ . decimali_n ] ];
Valgono le seguenti regole di scrittura:
- il $ è usato per indicare una variabile carattere
- i campi contenenti i dati possono essere letti in qualsiasi ordine
- campi o parti di campi possono essere riletti
Esempio:
INPUT
NOME $
1-8
INIZIALE $ 1 ;
- le variabili carattere possono essere al massimo lunghe 200 caratteri (da precisare con l'istruzione
LENGHT) e possono avere spazi come caratteri propri del campo (ad es. DE LUCA).
Alcune considerazioni sull'INPUT a colonna:
- i valori mancanti vanno codificati con un punto (all'interno del campo) o con degli spazi;
- i valori carattere, prima di essere assegnati alle variabili, sono allineati a sinistra.
Ad esempio, avendo l'istruzione INPUT SESSO $ 1-3 allora:
M
(il valore della variabile compare in prima colonna)
M (il valore della variabile compare in seconda colonna)
M (il valore della variabile compare in terza colonna)
sono letti allo stesso modo.
- i valori numerici possono comparire ovunque nel campo; possono essere specificati il segno, le cifre
decimali o l'esponente.
Esempio:
con l'istruzione
INPUT
si può leggere uno dei seguenti numeri
(il segno € indica uno spazio bianco)
X
1-6;
23
23.0
2.3E1
23
-23
- in un campo numerico non sono permessi spazi bianchi (esempio:
- 23 )
Esempio di input a colonna:
INPUT PESO 20-24 ETA 13-14 NOME $ 1-8 SESSO $ 11 ALTEZZA 16-18 INIZ_NOM
39
$ 1;
INPUT A LISTA:
La sintassi della istruzione è la seguente:
INPUT variabile_1 [$] [&]...........[ [variabile_n [$] [&] ] ;
Valgono le seguenti regole di scrittura:
- il $ è usato per indicare una variabile carattere;
- ogni campo deve essere specificato in ordine di registrazione dei dati d'ingresso;
- i campi devono essere separati da uno o più spazi o da altro carattere specificato con l’ozione
DELIMITER= dell’istruzione INFILE (vedi approfondimenti successivi);
- le variabili carattere possono essere al massimo lunghe 8 caratteri (a meno che non usi l'istruzione
LENGHT);
- se vi sono spazi bianchi come caratteri propri del campo (ad es. DE LUCA) deve essere messo nella
dichiarazione il simbolo &. In tal caso il campo successivo deve essere separato da almeno due spazi.
Considerazioni sull'input a lista:
- il SAS ricerca nella riga il primo carattere diverso da blank e da quel punto in poi, fino al successivo
blank (o successivi due, se compare il simbolo &), il campo letto rappresenta il valore della variabile;
- le variabili da leggere debbono comparire nella istruzione INPUT nello stesso ordine in cui appaiono
nei dati;
- i valori mancanti vanno tutti codificati con un punto;
- l'istruzione INPUT termina quando è stato assegnato un valore a tutte le variabili. Se i dati presenti in
una riga non sono sufficienti, verrà letto il record successivo.
ESEMPI
FILE DI INPUT:
Country Car
U.S.
U.S.
U.S.
U.S.
U.S.
Japan
Japan
U.S.
Germany
Sweden
Sweden
France
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
Japan
Japan
U.S.
Germany
Japan
U.S.
U.S.
U.S.
U.S.
U.S.
Japan
Italy
Germany
Japan
Germany
Germany
MPG
Weight
Drive
Ratio
Buick Estate Wagon
Ford Country Squire Wagon
Chevy Malibu Wagon
Chrysler LeBaron Wagon
Chevette
30.0
Toyota Corona
27.5
Datsun 510
27.2
Dodge Omni
30.9
Audi 5000
20.3
Volvo 240 GL
17.0
Saab 99 GLE
21.6
Peugeot 694 SL
16.2
Buick Century Special
Mercury Zephyr
20.8
Dodge Aspen
18.6
AMC Concord D/L
18.1
Chevy Caprice Classic
Ford LTD
17.6
Mercury Grand Marquis
Dodge St Regis
18.2
Ford Mustang 4
26.5
Ford Mustang Ghia
Mazda GLC
34.1
Dodge Colt
35.1
AMC Spirit
27.4
VW Scirocco
31.5
Honda Accord LX
29.5
Buick Skylark
28.4
Chevy Citation
28.8
Olds Omega
26.8
Pontiac Phoenix
33.5
Plymouth Horizon
Datsun 210
31.8
Fiat Strada
37.3
VW Dasher
30.5
Datsun 810
22.0
BMW 320i
21.5
VW Rabbit
31.9
Horse
power
16.9
19.2
18.5
2.155
2.560
2.300
2.230
2.830
3.140
2.795
3.410
20.6
3.070
3.620
3.410
17.0
3.725
16.5
3.830
2.585
21.9
1.975
1.915
2.670
1.990
2.135
2.670
2.595
2.700
2.556
34.2
2.020
2.130
2.190
2.815
2.600
1.925
Displa Cyl.
cement
4.360
2.73
15.5
4.054
3.605
2.56
3.940
2.45
3.70
68
3.05
95
3.54
97
3.37
75
3.90
103
3.50
125
3.77
115
3.58
133
3.380
2.73
3.08
85
2.71
110
2.73
120
3.840
2.41
2.26
129
3.955
2.26
2.45
135
3.08
88
2.910
3.08
3.73
65
2.97
80
3.08
80
3.78
71
3.05
68
2.53
90
2.69
115
2.84
115
2.69
90
2.200
3.37
3.70
65
3.10
69
3.70
78
3.70
97
3.64
110
3.78
71
40
Accel.
155
2.26
125
150
98
134
119
105
131
163
121
163
105
200
225
258
130
302
138
318
140
109
86
98
121
89
98
151
173
173
151
70
85
91
97
146
121
89
350
142
267
360
4
4
4
4
5
6
4
6
231
6
6
6
305
8
351
8
4
171
4
4
4
4
4
4
6
6
4
105
4
4
4
6
4
4
8
351
8
8
16.5
14.2
14.7
14.5
15.9
13.6
15.7
15.8
6
16.7
18.7
15.1
8
13.4
8
15.2
14.4
6
15.2
14.4
15.0
14.9
16.6
16.0
11.3
12.9
13.2
4
19.2
14.7
14.1
14.5
12.8
14.0
8
14.9
14.3
15.0
13.0
15.8
15.4
13.2
16.6
13.2
option pagesize=60 nodate;
data corso.CARS;
infile 'a:\acp.txt' firstobs=3;
length TIPO $ 25;
input NAZIONE $ TIPO $ & CONSUMO PESO DRIVE_R POTENZA
CILINDRA NUM_C RIPRESA;
proc print round;
var tipo NAZIONE CONSUMO PESO DRIVE_R POTENZA
run;
CILINDRA NUM_C RIPRESA;
OUTPUT SAS (parziale):
OBS TIPO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
NAZIONE
Buick Estate Wagon
Ford Country Squire Wagon
Chevy Malibu Wagon
Chrysler LeBaron Wagon
Chevette
Toyota Corona
Datsun 510
Dodge Omni
Audi 5000
Volvo 240 GL
Saab 99 GLE
Peugeot 694 SL
Buick Century Special
Mercury Zephyr
Dodge Aspen
AMC Concord D/L
Chevy Caprice Classic
Ford LTD
Mercury Grand Marquis
Dodge St Regis
CONSUMO
U.S.
U.S.
U.S.
U.S.
U.S.
Japan
Japan
U.S.
Germany
Sweden
Sweden
France
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
U.S.
16.9
15.5
19.2
18.5
30.0
27.5
27.2
30.9
20.3
17.0
21.6
16.2
20.6
20.8
18.6
18.1
17.0
17.6
16.5
18.2
PESO
DR_R
4.36
4.05
3.61
3.94
2.16
2.56
2.30
2.23
2.83
3.14
2.80
3.41
3.38
3.07
3.62
3.41
3.84
3.73
3.96
3.83
2.73
2.26
2.56
2.45
3.70
3.05
3.54
3.37
3.90
3.50
3.77
3.58
2.73
3.08
2.71
2.73
2.41
2.26
2.26
2.45
(non è riportata una parte dell'output)
41
POTEN CILIN NUM_C RIPRESA
155
142
125
150
68
95
97
75
103
125
115
133
105
85
110
120
130
129
138
135
350
351
267
360
98
134
119
105
131
163
121
163
231
200
225
258
305
302
351
318
8
8
8
8
4
4
4
4
5
6
4
6
6
6
6
6
8
8
8
8
14.9
14.3
15.0
13.0
16.5
14.2
14.7
14.5
15.9
13.6
15.7
15.8
15.8
16.7
18.7
15.1
15.4
13.4
13.2
15.2
INPUT A FORMATO:
La sintassi dell'istruzione è la seguente:
INPUT puntatore_1 variabile_1 formato_1 ....... [puntatore_n
variabile_n formato_n ] ;
dove:
- puntatore può essere uno dei seguenti:
+n
il cursore si sposta di n colonne
/
il cursore va alla prima colonna del record successivo
# n
il cursore si posiziona alla prima colonna del record n
@ n
il cursore si posiziona alla colonna n
@ 'carattere ' il cursore si posiziona alla prima colonna diversa da blank dopo il carattere indicato
@@
indica che i dati di un record si riferiscono a più osservazioni
- variabile è il nome valido di una o più variabili SAS
- formato può essere uno dei seguenti (sono elencati i più comuni):
w.
formato per leggere un campo numerico lungo w cifre;
w.d
formato per leggere un campo numerico lungo w cifre di cui d decimali;
$w.
formato per leggere un campo alfanumerico lungo w caratteri;
$CHARw. formato per leggere un campo alfanumerico lungo w caratteri contenente blank;
Ew.
formato esadecimale.
Con l'input a formato i valori mancanti vanno codificati con un punto o con degli spazi.
Si possono considerare variabili numeriche con 0 al posto di blank.
Si possono leggere anche "date" (in notazione inglese, italiana,...).
Esempio
data es1;
input sesso $ eta :3.1 hinch wlib @@;
altezza=hinch*2.54;
peso=wlib*0.4536;
f
f
f
f
f
f
;
run;
datalines;
143 56.3 85.0
191 62.5 112.5
160 62.0 94.5
157 64.5 123.5
191 65.3 107.0
141 61.8 85.0
f
f
f
f
f
f
155
171
140
149
150
140
62.3 105.0 f 153 63.3 108.0 f
62.5 112.0 f 185 59.0 104.0 f
53.8 68.5 f 139 61.5 104.0 f
58.3 93.0 f 143 51.3 50.5 f
59.5 78.5 f 147 61.3 115.0 f
53.5 81.0 f 164 58.0 83.5 f
161
142
178
145
180
176
59.0 92.0
56.5 69.0
61.5 103.5
58.8 89.0
63.3 114.0
61.3 112.0
Esempio:
INPUT NOME $ 8. @11 SESSO $ 1. +1 ETA 2. +1 ALTEZZA 3. +1 PESO 5.1;
E' possibile raggruppare i dati che vanno letti con il medesimo formato e raggruppare il formato
come nel seguente esempio:
INPUT GEN 3. FEB 3. MAR 3.;
può essere indifferentemente scritto come:
INPUT (GEN FEB MAR) (3. 3. 3.);
INPUT (GEN FEB MAR) (3.);
42
COMBINAZIONE DEI DIVERSI TIPI DI INPUT:
Si consideri il seguente esempio:
DATA CLASSE;
INPUT NOME $ @ 11 SESSO $ 1. ETA 13-14 ALTEZZA @ 20 PESO 5.1 ;
DATALINES;
GIANNI
M 12 155 48.2
il segno indica uno spazio bianco
MARCO
M 12 151€43.7
/* altri dati */
;
RUN;
la variabile NOME è letta a lista, la variabile SESSO a formato, la variabile ETA a colonna, la variabile
ALTEZZA a lista, la variabile PESO a formato.
ISTRUZIONI MULTIPLE DI INPUT:
Ogni volta che viene eseguita un'istruzione INPUT viene letto un record; se i dati relativi a una
osservazione si trovano su più record bisogna mettere più istruzioni INPUT.
Esempio:
DATA
CLASSE;
INPUT NOME $ 1-8
INPUT ETA 3-4;
INPUT ALTEZZA 1-4
DATALINES;
GIANNI
M
12
155 48.2
MARCO
M
12
151 43.7
/* altri dati */
;
RUN;
SESSO $
PESO
11;
6-10;
F8. L'ISTRUZIONE INFILE
Questa istruzione identifica un file in formato testo da cui si vogliono leggere i dati tramite
l'istruzione input. La sintassi è:
infile nomefile opzioni;
La principali opzioni sono:
firstobs
= n. primo record da leggere
obs
= n. ultimo record da leggere
Esempio:
data pippo;
infile 'a:pluto.txt' firstobs=3 obs=10;
input x1-x5;
run;
Vengono letti i record dal terzo al decimo.
43
F9. L'ISTRUZIONE OUTPUT
Quando una istruzione di OUTPUT è presente in un passo di Data, il SAS aggiunge osservazioni al
Data Set solo quando viene eseguita l'istruzione OUTPUT. In sostanza questa istruzione inibisce la
scrittura implicita operata dal SAS al termine del passo di Data.
La sintassi dell'istruzione è la seguente:
OUTPUT [nome di uno o più Data Set SAS ] ;
Quando il SAS incontra una o più istruzioni di OUTPUT:
- copia l'osservazione o le osservazioni appena lette sul (o sui) Data Set(s) specificati.
- prosegue l'esecuzione del passo di DATA con la istruzione seguente l'istruzione OUTPUT.
L'istruzione OUTPUT è necessaria quando:
- si devono creare osservazioni multiple con i dati di una singola linea dei dati di ingresso;
- si devono creare Data Set SAS multipli da un singolo Data Set di ingresso;
- si deve creare una sola osservazione combinando dati da linee di ingresso multiple.
Vediamo due esempi (riferiti al DATA SET DISNEY costruito in precedenza):
set corso.disney;
else put 'osservazioni sbagliate' _all_;
proc print data= corso.maschi;
proc print data= corso.femmine;
data corso.maschi;
set corso.disney;
if sesso='m' then output;
run;
F10. SCRITTURA SU UN FILE ESTERNO E ISTRUZIONE PUT
Per scrivere dati in formato testo su disco e sufficiente usare le seguenti istruzioni:
DATA _NULL_;
non costruisce nessun Data Set
SET nome Data Set SAS;
lettura dei dati in ingresso
(si può usare anche INPUT)
FILE '[path] nome ';
apre il file di uscita
PUT [specifiche];
scrittura sul file
L'istruzione PUT è l'equivalente dell'istruzione INPUT per la scrittura e specifica cosa e come
scrivere ciascuna linea. La forma generale dell'istruzione è:
PUT [specifiche ];
dove le specifiche indicano lo stile da usare; esso può essere:
- a lista
- a colonna
- a formato
- a stringa
- guidato
(a noi interessano solo i primi tre casi)
Analogamente all'istruzione INPUT, i vari modi di scrittura possono venire usati in una stessa
istruzione PUT.
Se non viene definito un file esterno, l'istruzione PUT stampa i dati nella finestra di Log.
In uno stesso passo di Data possono essere creati più files su dischi contenenti i risultati dell'analisi
effettuata. Ciascuno di questi Data Set deve essere "puntato" da una istruzione FILE, la cui sintassi nella
forma generale è quella scritta sopra.
44
G. PASSO DI PROC
La forma generale della chiamata di procedura è:
PROC nome procedura [DATA = nome Data Set SAS ];
G1. ALCUNE ISTRUZIONI USATE IN UN PASSO DI PROC
BY
[DESCENDING] variabili [NOTSORTED];
con tale istruzione i dati vengono processati per gruppi eventualmente in ordine discendente o in
ordine di apparizione.
OSS: i dati devono essere già ordinati altrimenti occorre usare l'opzione NOTSORTED.
OUTPUT
[OUT = nome Data Set SAS ] [parola chiave =nome ] ;
serve per creare un Data Set con i risultati di una procedura.
TITLE ['testo '];
definisce il testo da stampare all'inizio di ogni pagina di output.
FOOTNOTE ['testo '];
definisce il testo da stampare in fondo ad ogni pagina di output.
FORMAT variabili formato;
si usa per associare ad una variabile un "formato" che definisce come i valori di quella variabile
devono essere presentati in uscita.
LABEL variabile = 'etichetta ';
si usa per associare una etichetta ad una variabile, al fine di rendere maggiormente leggibili i risultati
prodotti dalle diverse procedure di uscita.
TITLE e FOOTNOTE rimangono attivi anche in successivi passi di PROC fino a quando non vengono
ridefiniti o annullati.
SELEZIONE DI VARIABILI E DI OSSERVAZIONI SULLE QUALI FAR AGIRE UNA PROCEDURA
VAR
variabili;
indica su quali variabili deve operare la procedura.
WHERE espressione;
permette di selezionare osservazioni dal Data Set su cui opera la procedura.
Esempio
OUTPUT SAS
PROGRAMMA SAS N. 25 bis:
proc print data=a.cars;
var nazione tipo peso;
where nazione='Germany';
run;
Osservare che nella colonna Obs rimane il numero
dell’osservazione del DSS
45
Obs
NAZIONE
9
26
35
37
38
Germany
Germany
Germany
Germany
Germany
TIPO
Audi 5000
VW Scirocco
VW Dasher
BMW 320i
VW Rabbit
PESO
1.28369
0.90266
0.99338
1.17936
0.87318
G2. LA PROCEDURA SORT
Questa procedura esegue l'ordinamento di valori numerici o carattere (secondo il codice ASCII).
La sintassi è la seguente:
PROC SORT [opzioni] ;
BY [DESCENDING] variabile [DESCENDING] variabile ...;
Le principali opzioni sono le seguenti:
- DATA = nome Data Set SAS
specifica il nome del Data Set da ordinare. Se omesso, il sistema SAS assume l'ultimo creato.
- OUT = nome Data Set SAS
il nome del Data Set di uscita; se non specificato il Data Set ordinato si sovrappone a quello originale
(se non ci sono errori).
L'istruzione BY deve sempre essere specificata. L'opzione DESCENDING indica che la variabile
deve essere ordinata in ordine decrescente. Quando ci sono più variabili specificate con BY
l'ordinamento viene fatto a partire dalla prima.
Esempio:
proc sort data= corso.disney out=dis;
by sesso eta;
proc print; run;
OBS
1
2
3
4
5
6
7
8
9
10
11
NOME
emy
ely
edy
clarabella
minnie
nonna papera
qui
quo
qua
pippo
paperino
SESSO
ETA
ALTEZZA
PESO
f
f
f
f
f
f
m
m
m
m
m
8
8
8
30
35
99
8
8
8
32
34
117
117
117
180
145
140
120
120
120
190
150
25
25
25
65
40
55
30
30
30
54
50
46
G3. LA PROCEDURA PRINT
Questa procedura visualizza tutte o alcune variabili di un Data Set SAS.
Alcune caratteristiche della procedura sono le seguenti:
- viene eseguita in modo automatico la formattazione delle informazioni da stampare entro le
dimensioni della pagina.
- le variabili sono stampate su colonne che sono identificate dal nome stesso della variabile o dalla
etichetta ad essa associata; salvo indicazione contraria dell'utente, viene inoltre stampata una colonna
identificata con OBS contenente un numero progressivo con cui vengono identificate le osservazioni.
- le osservazioni sono stampate, quando possibile, una per riga.
La sintassi della procedura la seguente:
PROC PRINT [opzioni ];
Le principali opzioni sono:
- DATA = nome DataSet SAS
specifica il nome del Data Set da stampare. Se omesso, il sistema SAS assume l'ultimo creato.
- NOOBS
elimina dall'output la colonna con il numero progressivo delle osservazioni
- ROUND
arrotonda i dati alla seconda cifra decimale
Le principali istruzioni della procedura PRINT sono le seguenti:
- VAR lista variabili ;
indica quali variabili devono essere stampate ed in quale ordine. Se omessa, vengono stampate tutte le
variabili nell'ordine in cui sono state memorizzate nel Data Set SAS.
- BY lista variabili ;
per avere una stampa separata su osservazioni raggruppate secondo le variabili definite da BY
(attenzione i dati devono essere già ordinati).
47
Esempio:
data corso.dis_eta;
set corso.disney;
e='giovane';
if eta>10 then e='vecchio';
proc sort data=corso.dis_eta out=eta1;
by e sesso;
proc print;
by e sesso;
run;
OUTPUT SAS:
--------------------------- E=giovane SESSO=f -------------------------------------OBS
1
2
3
NOME
ETA
ALTEZZA
PESO
emy
ely
edy
8
8
8
117
117
117
25
25
25
--------------------------- E=giovane SESSO=m -------------------------------------OBS
4
5
6
NOME
ETA
ALTEZZA
PESO
qui
quo
qua
8
8
8
120
120
120
30
30
30
--------------------------- E=vecchio SESSO=f -------------------------------------OBS
7
8
9
NOME
ETA
minnie
clarabella
nonna papera
35
30
99
ALTEZZA
PESO
145
180
140
40
65
55
--------------------------- E=vecchio SESSO=m -------------------------------------OBS
10
11
NOME
pippo
paperino
ETA
ALTEZZA
PESO
32
34
190
150
54
50
48
- SUM lista variabili ;
indica che devono essere stampati i totali delle variabili elencate.
- SUMBY variabile ;
va usata solo nel caso sia presente una istruzione BY ed una SUM con più variabili; la variabile
specificata deve comparire nella istruzione BY. Ogni volta che tale variabile cambia valore, vengono
stampati i totali delle variabili che sono specificate nella istruzione SUM.
Esempio:
proc sort data= corso.disney out=dis;
by sesso;
proc print data=dis;
by sesso;
sum eta altezza peso;
sumby sesso; run;
OUTPUT SAS:
---------------------------- SESSO=f ----------------------------OBS
1
2
3
4
5
6
NOME
ETA
ALTEZZA
PESO
minnie
clarabella
nonna papera
emy
ely
edy
35
30
99
8
8
8
--188
145
180
140
117
117
117
------816
40
65
55
25
25
25
---235
SESSO
---------------------------- SESSO=m ----------------------------OBS
7
8
9
10
11
SESSO
NOME
ETA
ALTEZZA
PESO
pippo
paperino
qui
quo
qua
32
34
8
8
8
--90
===
278
190
150
120
120
120
------700
=======
1516
54
50
30
30
30
---194
====
429
49
G4. LA PROCEDURA MEANS
La procedura serve per produrre statistiche relative a tutte le osservazioni di un Data Set SAS.
PROC MEANS [opzioni ] ;
Alcune opzioni sono:
specifica il Data Set su cui vengono calcolate le statistiche. Se omesso, si considera l'ultimo creato.
- NOPRINT
sopprime la stampa dei risultati nella finestra di output
- MAXDEC = n
indica il numero di cifre decimali desiderate.
- VARDEF = DF | N | ...
specifica il denominatore della formula della varianza. Il default è DF=(numero dati - 1).
Esempio:
proc
var
proc
var
means data= corso.disney;
altezza peso; run;
means data= corso.disney maxdec=2;
altezza peso; run;
OUTPUT SAS:
Variable
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------------ALTEZZA
11
137.8181818
26.3811227
117.0000000
190.0000000
PESO
11
39.0000000
14.5258390
25.0000000
65.0000000
-------------------------------------------------------------------Variable
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------------ALTEZZA
11
137.82
26.38
117.00
190.00
PESO
11
39.00
14.53
25.00
65.00
--------------------------------------------------------------------
proc means data= corso.disney;
var altezza peso;
proc print; run;
OUTPUT SAS (della proc print):
OBS
_TYPE_
_FREQ_
M_ALT
M_PESO
1
0
11
137.818
39
50
Può essere richiesto il calcolo delle seguenti statistiche (che vanno specificate come opzioni se si
vogliono diverse da quelle di default che sono indicate nella prima colonna):
N
numero di osservazioni esclusi i missing
MEAN media aritmetica
MIN minimo
MAX massimo
STD
deviazione standard
NMISS
numero di valori a missing
NOBS
numero totale di osservazioni
SUM
sommatoria
-
-
-
RANGE
VAR
STDERR
CSS
USS
CV
T
massimo-minimo
varianza
errore standard
somma dei quadrati corretta
somma dei quadrati non corretta
coefficiente di variazione
t di Student
Alcune istruzioni che possono essere usate con PROC MEANS:
BY lista variabili ;
(i dati devono per essere precedentemente sortati con PROC SORT).
CLASS lista variabili ;
FREQ variabile ;
nel calcolo delle statistiche ciascuna osservazione è considerata con una frequenza pari a n, dove n è il
valore della variabile FREQ. Se il valore della variabile FREQ è minore di 1, l'osservazione non è usata
nei calcoli. Se il valore non è intero ne viene considerata la parte intera.
OUTPUT
OUT = Data Set SAS
parole chiavi = nomi...;
richiede l'uscita delle statistiche in un nuovo Data Set. Le parole chiavi specificano le statistiche che
si vogliono nel nuovo Data Set e il nome delle variabili che contengono le statistiche. Le parole
chiavi ammesse sono quelle elencate sopra.
VAR lista variabili ;
indica su quali variabili devono essere effettuate le statistiche.
proc sort data= corso.disney out= dis;
by sesso;
proc means data=dis maxdec=2;
var altezza peso;
by sesso;
proc print; run;
OUTPUT SAS:
------------------------------------- SESSO=f --------------------------------Variable
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------------ALTEZZA
6
136.00
24.96
117.00
180.00
PESO
6
39.17
17.44
25.00
65.00
-------------------------------------------------------------------------------------------------------- SESSO=m --------------------------------Variable
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------------ALTEZZA
5
140.00
30.82
120.00
190.00
PESO
5
38.80
12.13
30.00
54.00
-------------------------------------------------------------------OBS
SESSO
_TYPE_
_FREQ_
M_ALT
M_PESO
1
2
f
m
0
0
6
5
136
140
39.1667
38.8000
51
G5. LA PROCEDURA FREQ
Conteggia le frequenze dei valori e produce tavole mono- o n-dimensionali con frequenze,
frequenze cumulate, percentuali, percentuali cumulate.
La sintassi è la seguete:
PROC FREQ [opzioni ];
Le opzioni che si possono usare sono:
specifica il Data Set su cui vengono calcolate le statistiche. Se omesso, si considera l'ultimo creato.
- ORDER=FREQ | DATA | INTERNAL | FORMATTED
specifica l'ordine in cui i livelli della variabile devono essere riportati. (se FREQ, i livelli sono ordinati
con i valori delle frequenze in ordine decrescente, se DATA secondo l'ordine in cui essi compaiono nei
dati di input, se INTERNAL secondo il valore interno - è il default, se FORMATTED, secondo il valore
esterno formattato).
- PAGE
viene stampata una tabella per pagina
Alcune istruzioni usate con FREQ:
- BY variabili ;
può essere usato per ottenere analisi separate per i gruppi definiti.
- TABLES richieste / opzioni ;
(attenzione richieste e opzioni vanno separate da / )
specifica su quali variabili costruire la tabella di frequenza e le sue dimensioni.
Le richieste sono composte da una o più variabili unite da un asterisco.
TABLES ETA SESSO ;
costruisce 2 tabelle monodimensionali
esempio
TABLES ETA * SESSO ; costruisce 1 tabella bidimensionale
Se TABLES è omesso, vengono costruite tabelle monodimensionali per tutte le variabili.
Abbreviazioni:
TABLES A*(B C);
equivalente a
TABLES A*B A*C;
TABLES (A B)*(C D); equivalente a
TABLES A*C A*D B*C B*D;
TABLES (A B C)*D; equivalente a
TABLES A*D B*D C*D;
TABLES (A--C);
equivalente a
TABLES A B C;
TABLES (A--C)*D;
equivalente a
TABLES A*D B*D C*D;
TABLES A--C*D;
ambiguo e non si può usare
Opzioni di TABLES:
-
MISSING
OUT = nome Data Set SAS
CHISQ
NOFREQ
NOPERCENT
NOROW
NOCOL
NOCUM
NOPRINT
interpreta i missing come una classe valida ai fini dell'analisi.
scrive l'ultima tabella più a destra indicata nelle richieste.
richiede un test chi-quadro
sopprime la stampa delle frequenze
sopprime la stampa delle percentuali
sopprime le percentuali di riga
sopprime le percentuali di colonna
sopprime la stampa delle cumulative, frequenze e percentuali
sopprime la stampa delle tabelle
52
proc freq data= corso.es3;
tables altezza; run;
OUTPUT SAS:
Cumulative Cumulative
ALTEZZA
Frequency
Percent
Frequency
Percent
----------------------------------------------------128.27
1
0.4
1
0.4
130.302
1
0.4
2
0.8
130.81
1
0.4
3
1.2
133.35
1
0.4
4
1.6
134.112
1
0.4
5
2.0
135.382
1
0.4
6
2.3
135.89
1
0.4
7
2.7
136.652
1
0.4
8
3.1
138.43
2
0.8
10
3.9
173.99
175.26
176.53
177.292
180.34
182.88
1
1
2
1
2
2
0.4
0.4
0.8
0.4
0.8
0.8
248
249
251
252
254
256
96.9
97.3
98.0
98.4
99.2
100.0
È possibile conservare in un DSS i valori assunti dalla variabile specificata (ordinati), le
corripondenti frequenze assolute e relative tramite l'opzione out=nome DSS del comando Tables.
proc freq data= corso.es3;
tables altezza / out=freq_alt;
proc print; run;
OUTPUT SAS della PROC PRINT:
OBS
ALTEZZA
1
2
3
4
128.270
130.302
130.810
133.350
COUNT
1
1
1
1
PERCENT
0.39063
0.39063
0.39063
0.39063
65
66
67
68
176.530
177.292
180.340
182.880
2
1
2
2
0.78125
0.39063
0.78125
0.78125
Per conservare anche le frequenze cumulate (assolute o relative) si procede nel seguente modo:
data freq_al2;
set freq_alt;
fc_ass+count;
fc_perc+percent;
proc print; run;
53
OUTPUT SAS:
OBS
ALTEZZA
1
2
3
4
128.270
130.302
130.810
133.350
COUNT
1
1
1
1
PERCENT
FC_ASS
0.39063
0.39063
0.39063
0.39063
1
2
3
4
FC_PERC
0.3906
0.7813
1.1719
1.5625
65
66
67
68
176.530
177.292
180.340
182.880
2
1
2
2
0.78125
0.39063
0.78125
0.78125
251
252
254
256
98.047
98.438
99.219
100.000
La PROC FREQ è soprattutto utilizzata per costruire tabelle di contingenza a due o più vie
(attenzione: questo ha senso per variabili qualitative o variabili quantitative che assumono un numero
"piccolo" di valori).
proc freq data=corso.dis_eta;
tables e*sesso; run;
OUTPUT SAS:
TABLE OF E BY SESSO
E
SESSO
Frequency|
Percent |
Row Pct |
Col Pct |f
|m
| Total
---------+--------+--------+
giovane |
3 |
3 |
6
| 27.27 | 27.27 | 54.55
| 50.00 | 50.00 |
| 50.00 | 60.00 |
---------+--------+--------+
vecchio |
3 |
2 |
5
| 27.27 | 18.18 | 45.45
| 60.00 | 40.00 |
| 50.00 | 40.00 |
---------+--------+--------+
Total
6
5
11
54.55
45.45
100.00
54
DIVISIONE IN CLASSI DI UNA VARIABILE QUANTITATIVA ATTRAVERSO LA
PROCEDURA FORMAT
Attenzione: dal punto di vista statistico la suddivisione in classi di una variabile quantitativa è
spesso un'operazione ARBITRARIA che può falsare i risultati.
proc format ;
value c_eta
low - 16 = 'giovani'
16 - 18 = 'medi'
18 - high = 'vecchi';
value c_alt
low - 145 = 'bassi'
145 - 165 = 'medi'
165 - high = 'alti';
value c_peso
low - 40 = 'magri'
40 - 60 = 'medi'
60 - high = 'grassi';
run;
Se si volessero gli intervalli chiusi a sinistra si dovrebbe scrivere:
value c_peso
low -< 40 = 'magri'
40 -< 60 = 'medi'
60 -< high = 'grassi';
PROCEDURA FREQ:
proc freq data=corso.es3;
format peso c_peso.
eta c_eta.
altezza c_alt.;
tables eta*(peso altezza) / norow nocol nofreq;run;
OUTPUT SAS:
TABLE OF ETA BY PESO
ETA
PESO
Percent |magri
|medi
|grassi | Total
--------+--------+--------+--------+
giovani | 24.22 | 22.66 |
0.39 | 47.27
--------+--------+--------+--------+
medi
|
5.47 | 24.22 |
1.56 | 31.25
--------+--------+--------+--------+
vecchi |
0.39 | 14.84 |
6.25 | 21.48
--------+--------+--------+--------+
Total
77
158
21
256
30.08
61.72
8.20
100.00
TABLE OF ETA BY ALTEZZA
ETA
ALTEZZA
Percent |bassi
|medi
|alti
| Total
--------+--------+--------+--------+
giovani | 13.67 | 31.64 |
1.95 | 47.27
--------+--------+--------+--------+
medi
|
1.56 | 23.44 |
6.25 | 31.25
--------+--------+--------+--------+
vecchi |
0.78 |
8.59 | 12.11 | 21.48
--------+--------+--------+--------+
Total
41
163
52
256
16.02
63.67
20.31
100.00
55
G6. LA PROCEDURA UNIVARIATE
La procedura produce statistiche descrittive per le variabili numeriche, quali valori estremi della
variabile, quantili, tabelle di frequenza, grafica box plot e stem and leaf.
PROC UNIVARIATE [opzioni ];
- NOPRINT
sopprime tutte le stampe dell'output. Può essere usato quando si vuole creare un nuovo Data Set.
- PLOT
produce uno stem-and-leaf plot, un box plot e un grafico della funzione di ripartizione empirica
sovrapposto a quello della funzione di ripartizione della Normale
- FREQ
crea una tabella di frequenza con i valori della variabile, le relative frequenze, percentuali, e
percentuali cumulate
- NORMAL
effettua un test per verificare se i dati provengono da una distribuzione Normale
- PCTLDEF = valore
specifica con quale formula calcolare il quantile.
- VARDEF = DF | WEIGHT | N | WDF
specifica il divisore da usare nel calcolo della varianza
(DF indica che devono essere usati i gradi di libertà , N-1; WEIGHT indica che deve essere usata la
somma dei pesi; N indica che si usa il numero delle osservazioni; WDF indica che si usa la somma dei
pesi meno 1). Il default è DF.
specifica le unità da usare per arrotondare i valori delle variabili
- per il calcolo di percentili addizionali a quelli di default elencati sotto:
specifica i nomi dei percentili
- PCTLNAME = nomi dei percentili
specifica quali percentili sono da calcolare
- PCTLPTS = valori dei percentili
specifica i prefissi da usare nel DS di output
- PCTLPRE = prefissi delle variabili
per le variabili contenenti i nuovi percentili
Per ciascuna variabile la procedura stampa:
nome variabile, etichetta, n° osservazioni, somma pesata e somma di pesi delle osservazioni, media,
somma, standard deviation, varianza, Skewness, Kurtosis, USS, CSS, errore standard della media, tests di
ipotesi sulla media = 0, numero di osservazioni diverse da 0, massimo, minimo, quartili (Q1, mediana,
Q3), distanza interquartile (Q3-Q1), rango, moda, percentili (1° 5° 10° 90° 95° 99°), valori missing
(simbolo, n°, percentuale rispetto al totale delle osservazioni), ... .
Alcune istruzioni usate con la procedura UNIVARIATE sono:
le statistiche sono calcolate per le variabili specificate
- VAR variabili ;
usato per ottenere analisi separate sulle osservazioni
- BY variabili ;
- FREQ variabile ;
ciascuna osservazione nel Data Set, che viene analizzata, è assunta rappresentare n osservazioni, dove
n è il valore della variabile specificata nell'istruzione FREQ
- ID lista variabili ;
le variabili specificate sostituiscono la colonna identificata con OBS
- WEIGHT variabile ;
quando è specificata, la PROC UNIVARIATE usa il valore della variabile specificata dopo WEIGHT
per calcolare la media pesata e la varianza pesata
parole chiavi = nomi...;
- OUTPUT OUT = Data Set SAS
richiede l'uscita delle statistiche in un nuovo Data Set. Le parole chiavi specificano le statistiche che
si vogliono nel nuovo Data Set e il nome delle variabili che contengono le statistiche
56
Le parole chiavi ammesse sono:
N NMISS NOBS MEAN SUM STD
MAX
MIN RANGE Q3 MEDIAN Q1
P99 MODE SIGNRANK NORMAL
(con i significati già indicati)
VAR SKEWNESS
QRANGE P1 P5
proc univariate
var altezza;
run;
data=corso.es3 plot;
OUTPUT SAS:
The UNIVARIATE Procedure
Variable: ALTEZZA
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation
Location
Mean
Median
Mode
256
155.890516
10.229599
-0.0091442
6247958.73
6.56204063
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
256
39907.972
104.644695
-0.19828
26684.3972
0.63934994
Basic Statistical Measures
Variability
155.8905
156.2100
156.2100
Std Deviation
Variance
Range
Interquartile Range
10.22960
104.64470
54.61000
15.24000
Tests for Location: Mu0=0
-Statistic-----p Value------
Test
Student's t
Sign
Signed Rank
t
M
S
243.8266
128
16448
Pr > |t|
Pr >= |M|
Pr >= |S|
<.0001
<.0001
<.0001
Quantiles (Definition 5)
Quantile
Estimate
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
182.880
180.340
171.450
168.910
163.322
156.210
148.082
143.002
139.700
130.810
128.270
Extreme Observations
------Lowest---------Highest----Value
Obs
Value
Obs
128.270
130.302
130.810
133.350
134.112
116
11
95
174
43
177.292
180.340
180.340
182.880
182.880
57
183
176
181
213
241
KURTOSIS SUMWGT
P10 P90 P95
Stem
18
17
17
16
16
15
15
14
14
13
13
12
Leaf
0033
5777
00011112334
555555555566666666678888888888899999999
0000000000111111111111122223333333344444
555566666666666666666666677777777777777888999
00000011111111111222222222222344444444444444
55556666666667777777777888888999
000000011222223333444444444
56788
0134
8
----+----+----+----+----+----+----+----+----+
Multiply Stem.Leaf by 10**+1
#
4
4
11
39
40
45
44
32
27
5
4
1
Boxplot
|
|
|
|
+-----+
*--+--*
|
|
+-----+
|
|
|
|
Normal Probability Plot
(non è riportata una parte di output)
G7. ALTRE PROCEDURE STATISTICHE ELEMENTARI
SUMMARY
calcola statistiche descrittive sulle variabili numeriche del Data Set SAS.
E' simile alla PROC MEANS. Di default non produce output.
TABULATE
costruisce tabelle di statistiche descrittive.
CORR
calcola i coefficienti di correlazione fra le variabili.
G8. ALCUNE PROCEDURE CHE OPERANO SUI DATA SET SAS
APPEND
serve per aggiungere informazioni in un Data Set SAS
COMPARE
confronta i valori delle variabili in due Data Set SAS
CONTENTS
scrive descrizioni dei contenuti di uno o più Data Set SAS
COPY
produce copie di una intera (o parti di) una libreria SAS
DATASETS
serve per mettere un Data Set in una libreria
TRANSPOSE
crea un nuovo Data Set SAS invertendo osservazioni con variabili
58
H. ISTRUZIONI E PROCEDURE GRAFICHE
H1. INTRODUZIONE
Gli esempi relativi al modulo SAS/GRAPH utilizzano i dati relativi a studenti dei Corsi di laurea in
Matematica, Informatica e Biologia dell’Università del Piemonte Orientale che hanno frequentato il corso
di Modelli Matematici e Statistici nell’anno accademico 1998/1999.
Di seguito sono riportati il questionario e la codifica delle variabili qualitative.
N.
SESSO
(M,F)
ALTEZZA
PESO
CORSO NUMERO COLORE
LAUREA SCARPA
OCCHI
COLORE
ATT.
DIPLOMA
CAPELLI SPORTIVA Superiore
1
CORSO LAUREA COLORE OCCHI
M= matematica
B = biologia
I = informatica
1 = scuri
2 = verdi
3 = azzurri
COL. CAPELLI
1 = scuri
2 = castani
3 = biondi
ATT. SPORTIVA
1 = nulla
2 = media
3 = alta
DIPLOMA :
1 = liceo scientif.
2 = liceo classico
3 = ist. tecnico
4 = ist. magistrale
5 = altro
H2. LA PROCEDURA GCHART
La procedura GCHART è una procedura usata per produrre: istogrammi (bar chart) - verticali o
orizzontali, istogrammi tridimensionali (block chart), areogrammi circolari (pie chart) o stellari (star
chart).
-
Occorre fornire alcune informazioni di base:
tipo di rappresentazione grafica desiderata;
variabile/i oggetto dell'analisi;
significato dell'altezza di ogni barra (blocco), o dell'area di un settore dell'areogramma;
indicazioni relative al criterio secondo il quale i valori della/e variabile/i di analisi dovranno essere
raggruppati prima della loro rappresentazione grafica.
La sintassi della procedura la seguente:
PROC GCHART [opzioni ];
BY variabili ;
istruzioni per specificare il tipo di rappresentazione:
VBAR variabili / [opzioni ] ;
HBAR variabili / [opzioni ] ;
BLOCK variabili / [opzioni ] ;
PIE variabili
/ [opzioni ] ;
STAR variabili / [opzioni ] ;
per istogrammi verticali
per istogrammi orizzontali
per istogrammi tridimensionali
per areogrammi circolari
per rappresentazioni polari
Le opzioni della dichiarazione PROC GCHART sono:
- ANNOTATE = nome Data Set SAS
specifica il data set da usare per le annotazioni; questo data set deve essere costruito in modo
particolare
- GOUT = nome Catalogo SAS
serve per produrre output grafici permanenti
59
proc gchart data=corso.mms;
hbar claurea;
vbar claurea;
block claurea;
run;
OUTPUT SAS
60
Principali opzioni per le istruzioni VBAR e HBAR.
- RAXIS = valori | AXISn
- MAXIS =valori | AXISn
specificano la descrizione degli assi (rispettivamente delle frequenze e della variabile); l'istruzione
AXISn deve essere precedentemente definita
goption vsize=6 cm hsize=12 cm ftext=swiss htext=1;
axis1 label=(h=2) length=5 cm;
axis2 label=(h=1.5) length=3 cm;
vbar claurea / maxis=axis1 raxis=axis2;
run;
OUTPUT SAS:
DISCRETE
Specifica che la variabile numerica oggetto di analisi assume valori discreti. Se l'opzione non è
presente, la procedura suppone che la variabile assuma valori continui e, in assenza dell'opzione
MIDPOINTS=, vengono scelti i baricentri delle classi rappresentate da barre o settori.
vbar scarpa;
vbar scarpa / discrete;
run;
OUTPUT SAS:
61
GROUP = variabile
produce grafici affiancati, ciascuno dei quali rappresenta le osservazioni che hanno un dato valore
della variabile
vbar sesso / group = sport discrete;
block sesso / group = sport discrete;
run;
OUTPUT SAS:
SUBGROUP = variabile
suddivide ciascuna barra o blocco in tante parti quanti sono i valori diversi assunti dalla variabile
indicata
pattern1 c=black v=x3;
pattern2 c=black v=r3;
vbar sport / subgroup=sesso discrete;
run;
OUTPUT SAS:
62
PATTERNID = SUBGROUP | GROUP | MIDPOINT | BY
specifica di cambiare "disegno" ogni volta che cambiano valore le variabili definite con le opzioni
SUBGROUP, GROUP o MIDPOINT o con l'istruzione BY. Per definire personalmente i "disegni" si
usano le istruzioni PATTERN1 ... PATTERNm
pattern1 c=black v=x5;
pattern2 c=black v=s;
vbar sesso / group=occhi discrete;
vbar sesso / group=occhi discrete patternid=group;
run;
goption reset=(all);
OUTPUT SAS:
Le seguenti due opzioni si usano quando è definita l'opzione GROUP e indicano rispettivamente lo
spazio da lasciare fra le barre dei gruppi e la descrizione dell'asse dei gruppi
- GSPACE = n
- GAXIS = AXISn
63
MIDPOINTS = valori
definisce l'insieme dei punti di mezzo che saranno associati a ciascuna barra o settore della
rappresentazione grafica; detti valori, numerici o alfanumerici, potranno essere elencati secondi le
forme seguenti:
1) una lista di valori numerici, esempio: 5 10 15 20 25 30;
2) una forma ripetitiva del tipo: 5 TO 30 BY 5 che genera i seguenti baricentri con passo 5:
5 10 15 20 25 30;
3) una forma ripetitiva del tipo: 10 TO 1 BY -1 che genera i seguenti baricentri con passo -1:
10 9 8 7 6 5 4 3 2 1;
4) una lista di stringhe racchiuse tra apici del tipo: 'Gennaio' 'Febbraio' 'Marzo' ecc...
vbar altezza / midpoints=150 to 190 by 20;
vbar altezza / midpoints=150 to 190 by 10;
run;
data part;
input p @@;
datalines;
117
117
118
119
121
122
128
128
129
129
129
132
137
138
139
141
142
144
;
proc gchart data=part;
hbar p / midpoints= 120 135 150;
hbar p / midpoints =123 138 153;
run;
122
133
148
64
126
134
150
127
135
155
128
136
156
- TYPE = parola chiave
specifica a cosa deve essere proporzionale la dimensione di ciascuna barra o settore della
rappresentazione grafica; in particolare, parola chiave può essere:
la frequenza con cui un valore è stato incontrato
- FREQ
la percentuale di osservazioni che assumono un dato valore
- PERCENT
rispettivamente, la frequenza o la percentuale cumulata
- CFREQ o CPERCENT
SUM
ciascuna barra rappresenta la somma dei valori della variabile
specificata nella opzione SUMVAR=
MEAN
il valore medio dei valori della variabile specificata nella opzione
SUMVAR=
SUMVAR = variabile
rappresenta il nome di una variabile i cui valori saranno elaborati in funzione dell'opzione TYPE=.
Esempi di uso di SUMVAR e TYPE:
vbar scarpa / discrete type=mean sumvar=altezza;
vbar scarpa / discrete type=mean sumvar=altezza group=sesso;
run;
OUTPUT SAS:
65
LEVELS = n
nel caso che la variabile da rappresentare negli istogrammi assuma valori continui, mediante questa
opzione è possibile specificare che il numero delle barre dell'istogramma dovrà essere n
vbar altezza / levels=7;
run;
REF = lista
specifica una lista di linee di riferimento da disegnare sull'asse delle frequenze
SPACE = n
specifica lo spazio fra le barre
WIDTH = n
specifica l'ampiezza delle barre
vbar diploma / discrete ref= 10 30 space=0 width=5;
run;
OUTPUT SAS:
66
Ulteriori opzioni per VBAR e HBAR:
CTEXT = colore
- COUTLINE = colore
specificano il colore rispettivamente dei testi e dei contorni.
MISSING
specifica che i valori mancanti devono essere considerati valori validi a cui associare barre o settori.
CAXIS = colore
specifica il colore degli assi
G100
è usata quando è presente l'opzione GROUP e forza le barre al 100 % in ciascun gruppo
LEGEND = LEGENDn
specifica la legenda da associare a ciascun grafico (nel caso in cui sia prevista, ad esempio se è
specificata l'opzione SUBGROUP); l'istruzione LEGENDn deve essere precedentemente definita
NOLEGEND
indica di omettere la legenda usata per ciascun sottogruppo
ASCENDING
DESCENDING
indicano di stampa le barre in ordine crescente (risp. decrescente) rispetto alle frequenze
FRAME
CFRAME = colore
la prima specifica di bordare l'area in cui è compreso il grafico; se non è presente l'opzione CFRAME il
colore del bordo è lo stesso di quello degli assi
NOZEROS
specifica che ogni barra con valore zero sia soppressa
NOAXIS
sopprime la stampa degli assi
MINOR = n
specifica il numero di tacche piccole da stampare fra le tacche grandi sull'asse delle frequenze
Ulteriori opzioni per HBAR:
NOSTATS
specifica che nessuna statistica venga stampata in un istogramma a barre orizzontali
FREQ
e CFREQ
indicano che siano stampate a lato del grafico le frequenze corrispondenti a ciascuna barra
(con CFREQ le frequenze cumulate)
PERCENT e CPERCENT
indica che siano stampate le percentuali per ciascuna barra (con CPERCENT le percentuali cumulate)
SUM e MEAN
indicano che siano stampati rispettivamente il numero totale di osservazioni per ciascuna barra e la
media delle osservazioni rappresentate da ciascuna barra
NOSYMBOL
indica di omettere la legenda dei simboli usati per ciascun sottogruppo
67
H3. LA PROCEDURA GPLOT
La procedura è in grado di produrre diagrammi cartesiani, cioè di rappresentare nel piano
l'andamento di una variabile al variare di un'altra.
PROC GPLOT [opzioni ] ;
Le opzioni della procedura sono:
- ANNOTATE = nome Data Set SAS
- GOUT = nome Catalogo SAS
usati come nella procedura GCHART
- UNIFORM
specifica che la scala sugli assi sia uguale quando è presente l'istruzione BY (per poter confrontare
grafici dei diversi livelli della variabile indicata con BY)
Le principali istruzioni della procedura sono:
- BY variabile ;
con il solito significato
- PLOT richieste / opzioni ;
- PLOT2 richieste / opzioni ;
1) lista di: var_vert * var_oriz
dove richieste :
2) lista di: var_vert * var_oriz = n
3) lista di: var_vert * var_oriz = variabile
specifica le variabili (verticali e orizzontali) da visualizzare sul grafico (1),
usando l'n-esimo simbolo (eventualmente specificato con l'istruzione SYMBOLn ) (2),
o usando la variabile i cui valori sono plottati per ciascun punto (3).
proc gplot data=corso.mms;
plot altezza*peso=1;
run;
OUTPUT SAS:
Si possono richiedere più grafici contemporaneamente (uno per pagina salvo opzione OVERLAY
indicata sotto).
L'istruzione PLOT2 genera un secondo asse verticale sulla destra dei grafici prodotti con
l'istruzione PLOT. Ha la stessa sintassi della procedura PLOT.
68
Le principali opzioni dell'istruzione PLOT sono:
Opzioni generali:
ANNOTATE = nome Data Set SAS
AREAS = n
specifica quali aree sopra o sotto le linee disegnate devono essere riempite; le aree sono numerate dal
basso in alto (l'area fra l'asse orizzontale e il grafico più "basso" è l'area 1, l' area fra tale grafico e quello
immediatamente più "alto" è l'area 2 e così via); i "disegni" dei ricoprimenti possono essere personalizzati
con le istruzioni PATTERN1 ... PATTERNn.
LEGEND = LEGENDn
NOLEGEND
la prima opzione specifica la legenda da associare a ciascun grafico (nel caso in cui sia prevista);
l'istruzione LEGENDn deve essere precedentemente definita; la seconda opzione indica di ometterla
OVERLAY per ottenere grafici sovrapposti
SKIPMISS
crea una interruzione nella linea che unisce i punti quando vi sono dei valori mancanti
Per linee di riferimento:
AUTOHREF
- AUTOVREF
disegna automaticamente linee di riferimento in corrispondenza delle tacche maggiori
HREF = valori
- VREF = valori
specifica che una linea orizzontale (risp.verticale) sia disegnata sul grafico all'altezza dei valori indicati
CHREF = colore
- CVREF = colore
- LHREF = n - LVREF = n
specifica i colori e i caratteri per le linee
GRID
produce una "griglia"
Per definire gli assi:
NOAXIS
sopprime la stama degli assi, delle etichette e dei valori
CAXIS = colore
- CTEXT = colore
specificano il colore rispettivamente della linea degli assi, dei testi sugli assi
FRAME
- CFRAME = colore
la prima specifica di bordare l'area in cui è compreso il grafico; se non è presente l'opzione CFRAME il
colore del bordo è lo stesso di quello degli assi
HAXIS = valori | AXISn
- VAXIS = valori | AXISn
specifica le tacche dell'asse verticale (risp. orizzontale); se si usa AXISn l'istruzione corrispondente deve
essere presente
PLOT y*x / VAXIS=10 TO 100 BY 5;
Esempio nel caso in cui si usano i valori:
PLOT y*x / VAXIS=10 100 1000 10000;
HMINOR = n
- VMINOR = n
specifica il numero di tacche piccole da disegnare fra le tacche grandi
HZERO
VZERO
richiede che le tacche sull'asse verticale (risp. orizzontale) inizino dall'origine
VREVERSE
specifica che l'ordine dei valori sull'asse verticale sia invertito
69
H4. ALCUNE ISTRUZIONI PER GLI OUTPUT GRAFICI
Le seguenti istruzioni permettono di personalizzare l'output grafico.
L'istruzione:
GOPTION opzioni ;
permette di impostare il default dell'output grafico.
Deve essere posizionata prima delle procedure grafiche e rimane attiva fino a una dichiarazione
successiva
- per i testi:
CTEXT = colore
FTEXT = font
HTEXT = n per i testi scritti
(con le istruzioni successive si possono modificare alcuni specifici testi)
- per i titoli:
CTITLE = colore
FTITLE = font
HTITLE = n per i titoli
(con le istruzioni successive si possono modificare alcuni specifici titoli)
Colori validi:
B blue
R red
G green
C cyan
P pink
Y yellow
W white
K black
M magenta
A gray|grey
N brown
O orange
Le tabelle dei font si trovano alle pagine 166-174 del manuale SAS GRAPH vol.1.
- per le dimensioni dell'output grafico:
VSIZE = n [IN | CM | CELL | PCT]
HSIZE = n [IN | CM | CELL | PCT]
- per il tipo di device grafico:
GDEVICE = nome
ad esempio il nome è: winprtg se la stampante è a toni di grigio
winprtc se la stampante è a colori
- per annullare istruzioni grafiche precedentemente assegnate
RESET = ALL (istruzione, istruzione, ...)
è utile usare questa istruzione all'inizio di ogni programma.
Esempio:
goption device=winprtg
vsize=15 cm
hsize=18 cm
ftext=swiss
htext=1;
70
Le seguenti istruzioni possono essere posizionate sia fuori che dentro le procedure grafiche.
Le seguenti due istruzioni operano come già osservato per gli output non grafici.
TITLEn [opzioni] 'testo ' ;
FOOTNOTEn [opzioni] 'testo ' ;
Alcune opzioni specifiche per gli output grafici sono:
- per il colore
C = colore
- per il tipo di carattere
F = font
- per l'altezza
H =n
- per la posizione
D = (coordinate)
La seguente istruzione serve per precisare il "disegno" con cui fare gli istogrammi (linee oblique a
destra, a sinistra, a croce, vuoto, pieno) quando l'istogramma deve essere suddiviso per sottoclassi.
PATTERNn opzioni ;
- per il tipo di disegno
V = valore
- per il colore
C = colore
TABELLA DEI PATTERN:
71
La seguente istruzione serve per precisare i caratteri con cui stampare i grafici e il modo con cui
unire i punti.
SYMBOLn opzioni ;
- per il colore
C = colore
- per il tipo di carattere
F = font
- per l'altezza
H =n
- per l'ampiezza
W =n
- per il carattere da plottare in corrispondenza
dei punti del grafico
V = valore
- per il tipo di tratteggio delle linee con cui
vanno uniti i punti
L =n
- per indicare se e come vanno interpolati i punti
I = NONE | JOIN | HILO | ...
72
La seguente istruzione serve per precisare come visualizzare gli assi.
AXISn opzioni ;
- per il colore
C = colore
- per la descrizione degli assi
LABEL = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)
- per i valori delle tacche
VALUE = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)
- per il tipo di linea (tratteggio) degli assi
STYLE = n
- per l'ampiezza in pixel della linea degli assi
WIDTH = n
- per la lunghezza della linea degli assi
LENGTH = n [IN | CM | CELL | PCT]
- per definire le coordinate dell'origine
ORIGIN = coordinate
- per per definire i valori delle tacche
ORDRE = lista
- per le tacche principali
MAJOR = NONE |(C = colore | H = n[IN | CM | CELL | PCT] | N = n | W = n | ...)
- per le tacche piccole
MINOR = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | N = n | W = n | ...)
La seguente istruzione serve per precisare l'aspetto delle legende che compaiono nei grafici.
LEGENDn opzioni ;
- per la descrizione degli assi
LABEL = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)
- per i valori delle tacche
VALUE = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)
73
H5. La procedura BOXPLOT:
Programma SAS n. 50:
proc sort data=corso.mms out=mms ;by sesso;run;
proc boxplot data=mms;
plot altezza*sesso/boxstyle = schematicid;
run;
proc sort data=corso.mms out=mms ;by sport;run;
proc boxplot data=mms;
plot altezza*sport/boxstyle = schematicid;
run;
quit;
OUTPT SAS:
74
Un esempio completo di uso della procedura GPLOT:
libname corso 'a:\mms';
goption reset=all;
goption
device=winprtg
hsize=18 cm
vsize=25 cm
ftext=swiss
htext=1;
goption reset=(symbol,axis,title);
symbol1 c=blue f=swissb v=-;
symbol2 c=green f=swissb v=@;
symbol3 c=red v=dot;
symbol4 c=pink f=swissb v=x;
symbol5 c=yellow f=swissb v=#;
symbol6 c=cyan f=swissb v=+;
proc gplot data=corso.ocp2;
plot PRIN2*PRIN1=NAZIONE/
href=0
vref=0
frame ;
title h=1.5 'Carta delle osservazioni';
run;
75
data ANNOTA;
set corso.MATP;
x=COL1;
y=COL2;
xsys='2';
ysys='2';
text=_NAME_;
size=1;
label y= 'secondo asse';
label x= 'primo asse';
keep X Y TEXT SIZE XSYS YSYS;
run;
goption reset=(symbol,axis);
symbol1 v=none;
axis1 order=-1 to 1 by 0.2 length=15 cm;
proc gplot data=ANNOTA;
plot Y*X=1/href=0 vref=0 annotate=ANNOTA frame
haxis=axis1 vaxis=axis1;
title h=1.5 'Carta delle variabili';
run;
OUTPUT SAS:
76
I. ERRORI E LETTURA DEL LOG
(tratto da SASOnlineDOC)
Error Processing and Debugging
Definitions
SAS performs error processing during both the compilation and the execution phases of SAS
processing. You can debug SAS programs by understanding processing messages in the SAS log and
then fixing your code. You can use the DATA Step Debugger to detect logic errors in a DATA step during
execution.
SAS recognizes five types of errors.
This type of occurs when ...
and is detected at ...
error ...
Syntax
programming statements do not conform to compile time
the rules of the SAS language
Semantic
the language element is correct, but the compile time
element may not be valid for a particular
usage
execution-time SAS attempts to execute a program and execution time
execution fails
Data
data values are invalid
execution time
macro-related you use the macro facility incorrectly
macro compile time or execution time,
DATA or PROC step compile time or
execution time
Types of Errors
Syntax Errors
Syntax errors occur when program statements do not conform to the rules of the SAS language.
Examples of syntax errors include
• misspelling a SAS keyword
• using unmatched quotation marks
• forgetting a semicolon
• specifying an invalid statement option
• specifying an invalid data set option.
When SAS encounters a syntax error, it first attempts to correct the error by attempting to interpret
what you meant, then continues processing your program based on its assumptions. If SAS cannot
correct the error, it prints an error message to the log.
In the following example, the DATA statement is misspelled, and SAS prints a warning message to the
log. Because SAS could interpret the misspelled word, the program runs and produces output.
date temp;
x=1;
run;
proc print data=temp;
run;
77
SAS Log: Syntax Error (misspelled key word)
1
date temp;
---14
WARNING 14-169: Assuming the symbol DATA was misspelled as date.
2
3
x=1;
run;
NOTE: The data set WORK.TEMP has 1 observations and 1 variables.
NOTE: DATA statement used:
real time
0.17 seconds
cpu time
0.04 seconds
4
5
6
run;
NOTE: PROCEDURE PRINT used:
real time
0.14 seconds
cpu time
0.03 seconds
Some errors are explained fully by the message that SAS prints in the log; other error messages are not
as easy to interpret because SAS is not always able to detect exactly where the error occurred. For
example, when you fail to end a SAS statement with a semicolon, SAS does not always detect the error
at the point where it occurs because SAS statements are free-format (they can begin and end
anywhere). In the following example, the semicolon at the end of the DATA statement is missing. SAS
prints the word ERROR in the log, identifies the possible location of the error, prints an explanation of
the error, and stops processing the DATA step.
data temp
x=1;
run;
run;
SAS Log: Syntax Error (missing semicolon)
1
2
data temp
x=1;
76
ERROR 76-322: Syntax error, statement will be ignored.
3
run;
NOTE: The SAS System stopped processing this step because of errors.
real time
0.11 seconds
cpu time
0.02 seconds
4
5
ERROR: File WORK.TEMP.DATA does not exist.
6
run;
real time
0.06 seconds
cpu time
0.01 seconds
Whether subsequent steps are executed depends on which method of running SAS you use, as well as
on your operating environment.
78
Semantic Errors
Semantic errors occur when the form of the elements in a SAS statement is correct, but the elements
are not valid for that usage. Semantic errors are detected at compile time and can cause SAS to enter
syntax check mode.
Examples of semantic errors include
• specifying the wrong number of arguments for a function
• using a numeric variable name where only a character variable is valid
• using illegal references to an array.
In the following example, SAS detects an illegal reference to the array ALL.
data _null_;
array all{*} x1-x5;
all=3;
datalines;
1 1.5
. 3
2 4.5
3 2 7
3 . .
;
run;
SAS Log: First Example of a Semantic Error
cpu time
0.02 seconds
1
data _null_;
2
array all{*} x1-x5;
ERROR: Illegal reference to the array all.
3
all=3;
4
datalines;
real time
2.28 seconds
cpu time
0.06 seconds
10
11
;
The following is another example of a semantic error. In this DATA step, the libref SOMELIB has not
been previously assigned in a LIBNAME statement.
data test;
set somelib.old;
run;
SAS Log:Second Example of a Semantic Error
cpu time
0.00 seconds
1
data test;
ERROR: Libname SOMELIB is not assigned.
2
set somelib.old;
3
run;
WARNING: The data set WORK.TEST may be incomplete. When this step was stopped
there were 0 observations and 0 variables.
real time
0.17 seconds
79
Execution-Time Errors
Definition
Execution-time errors occur when SAS executes a program that contains data values. Most executiontime errors produce warning messages or notes in the SAS log but allow the program to continue
executing. (footnote 1)The location of an execution-time error is usually given as line and column
numbers in a note or error message.
Common execution-time errors include the following:
• illegal arguments to functions
• illegal mathematical operations (for example, division by 0)
• observations in the wrong order for BY-group processing
• reference to a nonexistent member of an array (occurs when the array's subscript is out of
range)
• open and close errors on SAS data sets and other files in INFILE and FILE statements
• INPUT statements that do not match the data lines (for example, an INPUT statement in which
you list the wrong columns for a variable or fail to indicate that the variable is a character
variable).
Out-of-Resources Condition
An execution-time error can also occur when you encounter an out-of-resources condition, such as a full
disk, or insufficient memory for a SAS procedure to complete. When these conditions occur, SAS
attempts to find resources for current use. For example, SAS may ask the user for permission to delete
temporary data sets that might no longer be needed, or to free the memory in which macro variables
are stored.
When an out-of-resources condition occurs in a windowing environment, you can use the SAS CLEANUP
system option to display a requestor panel that enables you to choose how to resolve the error. When
you run SAS in batch, noninteractive, or interactive line mode, the operation of CLEANUP depends on
your operating environment. For more information about this system option, see CLEANUP in the "SAS
System Options" chapter in SAS Language Reference: Dictionary, and in the SAS documentation for
your operating environment.
Examples
In the following example, an execution-time error occurs when SAS uses data values from the second
observation to perform the division operation in the assignment statement. Division by 0 is an illegal
mathematical operation and causes an execution-time error.
options linesize=64 nodate pageno=1 pagesize=25;
data inventory;
input Item $ 1-14 TotalCost 15-20
UnitsOnHand 21-23;
UnitCost=TotalCost/UnitsOnHand;
datalines;
Hammers
440
55
Nylon cord
35
0
Ceiling fans 1155 30
;
proc print data=inventory;
format TotalCost dollar8.2 UnitCost dollar8.2;
run;
SAS Log: Execution-Time Error
cpu time
1
2
3
0.02 seconds
80
4
data inventory;
5
input Item $ 1-14 TotalCost 15-20
6
UnitsOnHand 21-23;
7
UnitCost=TotalCost/UnitsOnHand;
8
datalines;
NOTE: Division by zero detected at line 12 column 22.
RULE:----+----1----+----2----+----3----+----4----+----5----+---10
Nylon cord
35
0
Item=Nylon cord TotalCost=35 UnitsOnHand=0 UnitCost=. _ERROR_=1
_N_=2
NOTE: Mathematical operations could not be performed at the
following places. The results of the operations have been
set to missing values.
Each place is given by:
(Number of times) at (Line):(Column).
1 at 12:22
NOTE: The data set WORK.INVENTORY has 3 observations and 4
variables.
real time
2.78 seconds
cpu time
0.08 seconds
12
;
13
14
proc print data=inventory;
15
format TotalCost dollar8.2 UnitCost dollar8.2;
16
run;
NOTE: There were 3 observations read from the dataset
WORK.INVENTORY.
real time
2.62 seconds
SAS Output: Execution-Time Error
The SAS System
Obs
1
2
3
Item
Hammers
Nylon cord
Ceiling fans
Total
Cost
$440.00
$35.00
$1155.00
1
Units
OnHand
55
0
30
UnitCost
$8.00
.
$38.50
SAS executes the entire step, assigns a missing value for the variable UnitCost in the output, and writes
the following to the SAS log:
• a note
• the values stored in the input buffer
• the contents of the program data vector at the time the error occurred
• a note explaining the error.
Note that the values listed in the program data vector include the _N_ and _ERROR_ automatic
variables. These automatic variables are assigned temporarily to each observation and are not stored
with the data set.
In the following example of an execution-time error, the program processes an array and SAS
encounters a value of the array's subscript that is out of range. SAS prints an error message to the log
and stops processing.
data test;
array all{*} x1-x3;
input I measure;
if measure > 0 then
81
all{I} = measure;
datalines;
1 1.5
. 3
2 4.5
;
proc print data=test;
run;
cpu time
0.02 seconds
1
2
3
data test;
4
array all{*} x1-x3;
5
input I measure;
6
if measure > 0 then
7
all{I} = measure;
8
datalines;
ERROR: Array subscript out of range at line 12 column 7.
RULE:----+----1----+----2----+----3----+----4----+----5----+---10
. 3
x1=. x2=. x3=. I=. measure=3 _ERROR_=1 _N_=2
NOTE: The SAS System stopped processing this step because of
errors.
WARNING: The data set WORK.TEST may be incomplete. When this
step was stopped there were 1 observations and 5
variables.
real time
0.90 seconds
cpu time
0.09 seconds
12
;
13
14
15
run;
NOTE: There were 1 observations read from the dataset WORK.TEST.
real time
0.81 seconds
Data Errors
Data errors occur when some data values are not appropriate for the SAS statements that you have
specified in the program. For example, if you define a variable as numeric, but the data value is actually
character, SAS generates a data error. SAS detects data errors during program execution and continues
to execute the program, and does the following:
• writes an invalid data note to the SAS log.
• prints the input line and column numbers that contain the invalid value in the SAS log.
Unprintable characters appear in hexadecimal. To help determine column numbers, SAS prints a
rule line above the input line.
• prints the observation under the rule line.
• sets the automatic variable _ERROR_ to 1 for the current observation.
In this example, a character value in the Number variable results in a data error during program
execution:
82
data age;
input Name $ Number;
datalines;
Sue 35
Joe xx
Steve 22
;
proc print data=age;
run;
The SAS log shows that there is an error in line 61, position 5-6 of the program.
SAS Log: Data Error
cpu time
0.01 seconds
1
2
3
4
data age;
5
input Name $ Number;
6
datalines;
NOTE: Invalid data for Number in line 61 5-6.
RULE:----+----1----+----2----+----3----+----4----+----5----+---8
Joe xx
Name=Joe Number=. _ERROR_=1 _N_=2
NOTE: The data set WORK.AGE has 3 observations and 2 variables.
real time
0.06 seconds
cpu time
0.02 seconds
10
;
11
12
proc print data=age;
13
run;
NOTE: There were 3 observations read from the dataset WORK.AGE.
real time
0.01 seconds
SAS Output: Data Error
The SAS System
1
Obs
Name
Number
1
2
3
Sue
Joe
Steve
35
.
22
You can also use the INVALIDDATA= system option to assign a value to a variable when your program
encounters invalid data. For more information, see the INVALIDDATA= system option in SAS Language
Reference: Dictionary.
Format Modifiers for Error Reporting
The INPUT statement uses the ? and the ?? format modifiers for error reporting. The format modifiers
control the amount of information that is written to the SAS log. Both the ? and the ?? modifiers supress
the invalid data message. However, the ?? modifier also sets the automatic variable _ERROR_ to 0. For
example, these two sets of statements are equivalent:
•
•
nput x ?? 10-12;
input x ? 10-12;
83
•
_error_=0;
In either case, iSAS sets the invalid values of X to missing values.
Macro-related Errors
Several types of macro-related errors exist:
• macro compile time and macro execution-time errors, generated when you use the macro facility
itself
• errors in the SAS code produced by the macro facility.
For more information about macros, see SAS Macro Language: Reference.
FOOTNOTE 1: When you run SAS in noninteractive mode, more serious errors can cause SAS to enter
syntax check mode and stop processing the program.
Error Processing
Syntax Check Mode
If a DATA step has a syntax error, SAS can enter syntax check mode. SAS internally sets the OBS=
option to 0 and the REPLACE/NOREPLACE option to NOREPLACE. When these options are in effect, SAS
• reads the remaining statements in the DATA step
• checks that statements are valid SAS statements
• executes global statements
• identifies any other errors that it finds
• creates the descriptor portion of any output data sets that are specified in program statements
• does not write any observations to new data sets that SAS creates
• does not execute most of the subsequent DATA steps or procedures in the program (exceptions
include PROC DATASETS and PROC CONTENTS).
Note: Any data sets that are created after SAS has entered syntax check mode do not replace existing
data sets with the same name.
How Different Modes Process Errors
When SAS encounters most syntax or semantic errors, SAS underlines the point where it detects the
error and identifies the error by number. If SAS encounters a syntax error when you run noninteractive
SAS programs or batch jobs, it enters syntax check mode and remains in this mode until the program
finishes executing.
When you run SAS in interactive line mode or in a windowing environment, syntax check mode is in
effect only during the step where SAS encountered the error. When the system detects an error, it stops
executing the current step and continues processing the next step.
Processing Multiple Errors
Depending on the type and severity of the error, the method you use to run SAS, and your operating
environment, SAS either stops program processing or flags errors and continues processing. SAS
continues to check individual statements in procedures after it finds certain kinds of errors. Thus, in
some cases SAS can detect multiple errors in a single statement and may issue more error messages for
a given situation, particularly if the statement containing the error creates an output SAS data set.
The following example illustrates a statement with two errors:
data temporary;
Item1=4;
run;
proc print data=temporary;
var Item1 Item2 Item3;
run;
84
SAS Log: Multiple Program Errors
cpu time
0.00 seconds
1 data temporary;
2
Item1=4;
3 run;
NOTE: The data set WORK.TEMPORARY has 1 observations and 1
variables.
real time
0.10 seconds
cpu time
0.01 seconds
4
5 proc print data=temporary;
ERROR: Variable ITEM2 not found.
ERROR: Variable ITEM3 not found.
6
var Item1 Item2 Item3;
7 run;
NOTE: The SAS System stopped processing this step because of
errors.
real time
0.53 seconds
cpu time
0.01 seconds
SAS displays two error messages, one for the variable Item2 and one for the variable Item3.
When running debugged production programs that are unlikely to encounter errors, you may want to
force SAS to abend after a single error occurs. You can use the ERRORABEND system option to do this.
Using System Options to Debug a Program
You can use the following system options to control error handling (resolve errors) in your program:
BYERR
controls whether SAS generates an error message and sets the error flag when a
_NULL_ data set is used in the SORT procedure.
DKRICOND= controls the level of error detection for input data sets during the processing of DROP=,
KEEP=, and RENAME= data set options.
DKROCOND= controls the level of error detection for output data sets during the processing of
DROP=, KEEP=, and RENAME= data set options and the corresponding DATA step
statements.
DSNFERR
controls how SAS responds when a SAS data set is not found.
ERRORABEND specifies how SAS responds to errors.
ERRORCHECK= controls error handling in batch processing.
ERRORS=
controls the maximum number of observations for which complete error messages are
printed.
FMTERR
determines whether SAS generates an error message when a format of a variable cannot
be found.
INVALIDDATA= specifies the value that SAS assigns to a variable when invalid numeric data is
encountered.
MERROR
controls whether SAS issues a warning message when a macro-like name does not
match a macro keyword.
SERROR
controls whether SAS issues a warning message when a defined macro variable
reference does not match a macro variable.
VNFERR
controls how SAS responds when a _NULL_ data set is used.
For more information, see "SAS System Options" in SAS Language Reference: Dictionary.
Using Return Codes
In some operating environments SAS passes a return code to the system, but accessing return codes is
specific to your operating environment.
Operating Environment Information: For more information about return codes, see the SAS
documentation for your operating environment.
85
J. APPROFONDIMENTI: MANIPOLAZIONE DI DATA SET SAS
J1. Overview of Methods for Combining SAS Data Sets
You can use these methods to combine SAS data sets:
• concatenating
• interleaving
• one-to-one reading
• one-to-one merging
• match merging
• updating.
Concatenating
The following figure shows the
results of concatenating two SAS
data sets. Concatenating the data sets
appends the observations from one
data set to another data set. The
DATA step reads DATA1
sequentially until all observations
have been processed, and then reads
DATA2. Data set COMBINED
contains the results of the
concatenation. Note that the data sets
are processed in the order in which
they are listed in the SET statement.
Interleaving
results of interleaving two SAS data
sets. Interleaving intersperses
observations from two or more data
sets, based on one or more common
variables. Data set COMBINED
shows the result.
86
One-to-One Reading and
One-to-One Merging
results of one-to-one reading and
one-to-one merging. One-to-one
reading combines observations from
two or more SAS data sets by
creating observations that contain all
of the variables from each
contributing data set. Observations
are combined based on their relative
position in each data set, that is, the
first observation in one data set with
the first in the other, and so on. The
DATA step stops after it has read the
last observation from the smallest
data set. One-to-one merging is
similar to a one-to-one reading, with
two exceptions: you use the MERGE
statement instead of multiple SET
statements, and the DATA step reads
all observations from all data sets.
Data set COMBINED shows the
result.
Match-Merging
results of match-merging. Matchmerging combines observations from
two or more SAS data sets into a
single observation in a new data set
based on the values of one or more
common variables. Data set
COMBINED shows the results.
87
Updating
results of updating a master data set.
Updating uses information from
observations in a transaction data set
to delete, add, or alter information in
observations in a master data set.
You can update a master data set by
using the UPDATE statement or the
MODIFY statement. If you use the
UPDATE statement, your input data
sets must be sorted by the values of
the variables listed in the BY
statement. (In this example,
MASTER and TRANSACTION are
both sorted by Year.) If you use the
MODIFY statement, your input data
does not need to be sorted.
UPDATE replaces an existing file
with a new file, allowing you to add,
delete, or rename columns. MODIFY
performs an update in place by
rewriting only those records that
have changed, or by appending new
records to the end of the file.
Note that by default, UPDATE and
MODIFY do not replace nonmissing
values in a master data set with
missing values from a transaction
data set.
88
J2. MANIPOLAZIONE DI DATA SET SAS
J2.1. PER CONCATENARE I DSS: USO DI SET
I DSS sono concatenati uno dietro l’altro secondo l’ordine con cui sono scritti nell’istruzione SET.
Il DSS costruito ha l’unione delle variabili dei DSS di partenza e un numero di osservazioni pari alla
somma del numero di osservazioni dei DSS di partenza.
DSS Animali
a
a
b
c
Antilope
Ariete
Balena
Canguro
DSS Animali_1
5
.
3
7
data a.concat1;
set a.animali a.animali_1;
run;
data a.concat2;
set a.animali a.piante;
run;
a
b
c
d
d
e
f
DSS Piante
Aquila
Bufalo
Cervo
Delfino
Daino
Elefante
Farfalla
Obs
Comune
1
2
3
4
5
6
7
8
9
10
11
Obs
1
2
3
4
5
6
7
8
9
10
a
a
b
c
a
b
c
d
d
e
f
18
19
17
.
13
15
16
Animale
Numero
Antilope
Ariete
Balena
Canguro
Aquila
Bufalo
Cervo
Delfino
Daino
Elefante
Farfalla
5
.
3
7
18
19
17
.
13
15
16
Comune Animale
a
a
b
c
a
b
c
c
d
e
a
b
c
c
d
e
Antilope
Ariete
Balena
Canguro
89
Numero Piante
5
.
3
7
29
25
.
27
24
25
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
29
25
.
27
24
25
I due DSS di partenza
hanno le stesse variabili.
I due DSS di partenza
hanno alcune variabili
uguali e alcune diverse.
J2.2. PER INTERCALARE I DSS: USO DI SET – BY E DI MERGE – BY
La sintassi è:
data DSS;
set nome-DSS1 nome-DSS2;
BY nome-var;
data DSS;
merge nome-DSS1 nome-DSS2;
BY nome-var;
Il DSS costruito ha l’unione delle variabili dei DSS di partenza.
I DSS di partenza devono essere ordinati secondo la variabile BY.
Nel DSS costruito le osservazioni dei DSS di
partenza sono intercalate le une alle altre
secondo la variabile BY: il numero delle
osservazioni è uguale quindi alla somma del
numero delle osservazioni dei DSS di partenza.
DSS Animali
a
a
b
c
Antilope
Ariete
Balena
Canguro
Nel DSS costruito le osservazioni dei DSS di
partenza sono COMBINATE in una sola
osservazione secondo la variabile BY.
Il numero di osservazioni del DSS finale è
uguale alla somma del numero massimo di
osservazioni della variabile BY in tutti i DSS.
Il valore di ciascuna variabile è “ritenuto” fino a
quando non sono state scritte tutte le
osservazioni della variabile BY.
DSS Animali_1
5
.
3
7
data a.usosetby1;
set
a.animali
a.animali_1;
by comune;
run;
a
b
c
d
d
e
f
Obs
1
2
3
4
5
6
7
8
9
10
11
DSS Piante
Aquila
Bufalo
Cervo
Delfino
Daino
Elefante
Farfalla
Comune Animale
a
a
a
b
b
c
c
d
d
e
f
1
2
3
4
5
6
7
8
9
10
a
a
a
b
b
c
c
c
d
e
Numero
Antilope
Ariete
Aquila
Balena
Bufalo
Canguro
Cervo
Delfino
Daino
Elefante
Farfalla
Obs Comune Animale
data a.usosetby2;
set a.animali a.piante;
by comune;
run;
a
b
c
c
d
e
18
19
17
.
13
15
16
Antilope
Ariete
Balena
Canguro
90
5
.
29
3
25
7
.
27
24
25
29
25
.
27
24
25
I due DSS di partenza hanno
le stesse variabili.
5
.
18
3
19
7
17
.
13
15
16
Numero
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
Piante
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
I due DSS di partenza hanno
alcune variabili uguali e
alcune diverse.
DSS Animali_nr
a
b
c
Antilope
Balena
Canguro
DSS Piante_nr
5
.
7
data a.usomerge1;
merge
a.animali_nr
a.piante_nr;
by comune;
run;
DSS Animali
a
a
b
c
Antilope
Ariete
Balena
Canguro
a
b
c
d
e
Obs
Ananas
Banano
Cocco
Dattero
Ebano
Comune
1
2
3
4
5
29
25
.
24
25
Animale
a
b
c
d
e
Numero
Antilope
Balena
Canguro
29
25
.
24
25
Piante
Ananas
Banano
Cocco
Dattero
Ebano
I due DSS hanno una sola
osservazione per ogni valore
della variabile BY.
DSS Piante
5
.
3
7
data a.usomergeby1;
merge a.animali a.piante;
by comune;
run;
data a.usomergeby2;
merge a.piante a.animali;
by comune;
run;
a
b
c
c
d
e
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
Obs Comune
1
2
3
4
5
6
7
a
a
b
c
c
d
e
Obs Comune
1
2
3
4
5
6
7 e
a
a
b
c
c
d
29
25
.
27
24
25
Animale
Numero
Antilope
Ariete
Balena
Canguro
Canguro
29
.
25
.
27
24
25
Piante
Ananas
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
25
91
Piante
Ananas
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
Numero
Animale
5
.
3
7
27
24
Antilope
Ariete
Balena
Canguro
Canguro
Uno dei due DSS ha più
osservazioni
per
uno
stesso
valore
della
variabile BY.
La
variabile
NUMERO
(comune ai due DSS, ma non
variabile BY) assume i valori
del DSS2. Se le osserazioni
sono duplicate per il primo
DSS e non per il secondo la
variabile NUMERO assume i
valori del primo DSS.
J2.3. PER “AFFIANCARE” DSS CON VARIABILI DIVERSE: USO DI
MERGE
La sintassi è:
data DSS;
set nome-DSS1;
set nome-DSS2;
SET – SET
E DI
data DSS;
merge nome-DSS1 nome-DSS2;
Il DSS costruito ha l’unione delle variabili dei DSS di partenza.
Le variabili del DSS2 vengono sovrapposte a quelle del DSS1. Se le variabili del DD2 sono mancanti
vengono lasciati i valori del DSS1. Più precisamente. Viene letta la prima osservazione dal DSS1 poi la
prima osservazione dal secondo DSS2: se entrambi i DSS contengono le stesse variabili il valore del
DSS1 è sostituito da quello del DSS2, anche se il valore è missing.
CASO DI SET – SET
CASO DI MERGE
Il procedimento è ripetuto fino a quando è letta Il procedimento è ripetuto fino a quando è letta
l’ultima osservazione del più CORTO DSS.
l’ultima osservazione del più LUNGO DSS.
Il DSS costruito ha un numero di osservazioni pari Il DSS costruito ha un numero di osservazioni pari
al numero MINIMO di osservazioni dei due DSS di al numero MASSIMO di osservazioni dei due DSS di
partenza.
partenza.
PRIMO CASO: Le osservazioni sono scritte nello stesso ordine nei due DSS da affiancare (fra i nostri
esempi consideriamo quei DSS che non hanno osservazioni ripetute).
DSS Animali_nr
a
b
c
Antilope
Balena
Canguro
DSS Piante_nr
5
.
7
a
b
c
d
e
Ananas
Banano
Cocco
Dattero
Ebano
29
25
.
24
25
data a.usosetset1;
set a.animali_nr;
set a.piante_nr;
run;
Obs Comune
data a.usomsetset2;
set a.piante_nr;
set a.animali_nr;
run;
Obs Comune
Piante
Numero Animale
1
a
2
b
3
c
Obs Comune
Ananas
Banano
Cocco
Animale
5
.
7
Numero
Antilope
Balena
Canguro
Piante
1
a
2
b
3
c
4
d
5
e
Obs Comune
1
a
2
b
3
c
4
d
5
e
Antilope
Balena
Canguro
29
25
.
24
25
Numero
5
.
7
24
25
Ananas
Banano
Cocco
Dattero
Ebano
Animale
Antilope
Balena
Canguro
data a.usomerge1;
merge
a.animali_nr
a.piante_nr;
run;
data a.usomerge2;
merge
a.piante_nr
a.animali_nr;
run;
1
2
3
a
b
c
Animale
Antilope
Balena
Canguro
Piante
Ananas
Banano
Cocco
Dattero
Ebano
92
Numero Piante
29
25
.
Ananas
Banano
Cocco
I
di partenza
hanno
osservazioni scritte
stesso ordine.
nello
I
DSS
DSS
di partenza
hanno
osservazioni scritte nello
stesso ordine nei due DSS.
Il risultato è analogo al caso di
MERGE - BY.
SECONDO CASO: Le osservazioni NON sono scritte nello stesso ordine nei due DSS da affiancare (fra i
nostri esempi consideriamo quei DSS che hanno osservazioni ripetute).
DSS Animali
a
a
b
c
Antilope
Ariete
Balena
Canguro
DSS Animali_1
5
.
3
7
data a.usosetset3;
set a.animali;
set a.piante;
run;
data a.usosetset3;
set a.piante;
set a.animali;
run;
a
b
c
d
d
e
f
Obs
Aquila
Bufalo
Cervo
Delfino
Daino
Elefante
Farfalla
Comune Animale
1
2
3
4
a
b
c
c
Obs Comune
1
2
3
4
a
a
b
c
data a.usomerge3;
merge
a.animali
a.piante;
run;
Obs Comune
data a.usomerge4;
merge
a.piante
a.animali;
run;
Obs Comune
1
2
3
4
5
6
a
b
c
c
d
e
1
2
3
4
5
6
DSS Piante
a
a
b
c
d
e
Antilope
Ariete
Balena
Canguro
Piante
Ananas
Banano
Cocco
Ciliegio
Numero
29
25
.
27
Numero
5
.
3
7
Animale
Numero
Antilope
Ariete
Balena
Canguro
29
25
.
27
24
25
Piante
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
a
b
c
c
d
e
18
19
17
.
13
15
16
Piante
Ananas
Banano
Cocco
Ciliegio
Animale
Antilope
Ariete
Balena
Canguro
Piante
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
Numero
Animale
5
.
3
7
24
25
Antilope
Ariete
Balena
Canguro
TERZO CASO: I DSS di partenza hanno le stesse variabili.
data a.usosetset4;
set a.animali;
set a.animali_1;
run;
proc print;
run;
Obs
1
2
3
4
Comune
a
b
c
d
Animale
Numero
Aquila
Bufalo
Cervo
Delfino
18
19
17
.
93
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
29
25
.
27
24
25
I DSS di partenza hanno
alcune variabili uguali e altre
diverse.
Il DSS2 viene sovrapposto Al
DSS1; la variabile Numero
contiene i valori del DSS2
anche
se
mancante.
il
valore
è
I DSS di partenza hanno
osservazioni ripetute per la
variabile Comune.
Fare molta attenzione!
J2.4. PER AGGIORNARE UN DSS: USO DI UPDATE
La sintassi dell’istruzione è la seguente:
UPDATE DSS-principale DSS-di-transizione;
BY variabile/i ;
Il DSS principale è quello da aggiornare, il DSS di transizione contiene i valori aggiornati.
È opportuno che i valori della variabile BY siano unici per ogni osservazione del DSS principale: se il DSS
principale contiene due osservazioni con lo stesso valore della variabile BY la prima è aggiornata e la
seconda è ignorata.
Se il DSS di transizione contiene un valore mancante, rimane il valore del DSS principale.
DSS Piante
a
b
c
c
d
e
Ananas
Banano
Cocco
Ciliegio
Dattero
Ebano
DSS Piante_nr
29
25
.
27
24
25
a
b
c
d
e
Ananas
Banano
Cocco
Dattero
Ebano
data a.usoupdate;
update
a.piante_nr
a.piante_nr_2;
by comune;
run;
Obs
Comune
1
2
3
4
5
a
b
c
d
e
data a.usoupdate;
update
a.piante
a.piante_nr_2;
by comune;
run;
Obs
data usoupdate;
update piante piante_3;
by comune;
run;
Obs
1
2
3
4
5
6
DSS Piante_3
c
d
e
c
c
d
e
32
34
35
Piante
a
b
c
c
d
e
Comune
a
b
c
c
d
e
Cipresso
Edera
Numero
Ananas
Banano
Cipresso
Dattero
Edera
Comune
1
2
3
4
5
6
29
25
.
24
25
DSS Piante_nr_2
Piante
Ananas
Banano
Cipresso
Ciliegio
Dattero
Edera
Piante
Ananas
Banano
Cedro
Ciliegio
Dattero
Edera
29
25
32
34
35
Numero
29
25
32
27
34
35
Numero
29
25
39
27
34
35
32
39
34
35
Cipresso
Cedro
Edera
Il DSS principale non ha
variabile BY.
Il
DSS
principale
ha
variabile BY.
E’ aggiornata solo la prima
osservazione.
Sia il DSS principale che quello
di
transizione
hanno
variabile BY: ha effetto solo
l’ultima osservazione del DSS
di transizione sulla prima
osservazione
del
DSS
principale.
PER AGGIORNARE UN DSS: USO DI MERGE – BY
data upmergeby;
merge piante piante_3;
by Comune;
run;
Obs
Comune
1
2
3
4
5
6
a
b
c
c
d
e
Piante
Ananas
Banano
Cipresso
Cedro
Edera
94
Numero
29
25
32
39
34
35
Si sono modificate entrambe
le osservazioni con valore
ripetuto per la variabile BY,
ma il valore mancante ha
sostituito il valore originale del
primo DSS.
J2.5 PER AGGIUNGERE OSSERVAZIONI A UN DSS: LA PROC
APPEND
La sintassi della procedura è la seguente:
PROC APPEND BASE= DSS-principale DATA= DSS-di-transizione <FORCE>;
Il DSS principale è quello a cui si vogliono aggiungere osservazioni, che sono contenute nel DSS di
transizione.
Le variabili del DSS di transizione non contenute già nel DSS principale vengono ignorate.
Se le variabili sono le stesse nei due DSS, la Proc Append e l’istruzione SET producono un risultato
analogo: se il DSS principale è molto grande può essere più efficiente usare la procedura APPEND che
usare l’istruzione SET.
Attenzione: il DSS principale viene modificato; può essere opportuno fare una copia del DSS principale.
data a.animali_bis;
set a.animali;
proc
append
base=a.animali_bis
data=a.animali1_bis;
run;
proc
append
base=a.animali_bis
data=a.piante_bis force;
run;
proc print;
run;
data a.animali1_bis;
set a.animali1;
Obs
Comune
1
2
3
4
5
6
7
8
9
10
11
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
a
a
b
c
a
b
c
d
d
e
f
Comune
a
a
b
c
a
b
c
d
d
e
f
a
b
c
c
d
e
Animale
data a.piante_bis;
set a.piante;
Numero
Antilope
Ariete
Balena
Canguro
Aquila
Bufalo
Cervo
Delfino
Daino
Elefante
Farfalla
Animale
Antilope
Ariete
Balena
Canguro
Aquila
Bufalo
Cervo
Delfino
Daino
Elefante
Farfalla
95
5
.
3
7
18
19
17
.
13
15
16
Numero
5
.
3
7
18
19
17
.
13
15
16
29
25
.
27
24
25
I DSS hanno
variabili.
le
stesse
I DSS non hanno le stesse
variabili.
È
necessario
mettere
l’opzione FORCE.
La variabile Piante del DSS
di transizione viene ignorata.
J3. OSSERVAZIONI RAGGRUPPATE:
USO DI SET – BY E VARIABILI FIRST.<..> E LAST.<..>
Understanding BY Groups
BY Groups with a Single BY Variable
The following figure represents the results of processing your data with the single BY variable ZipCode.
The input SAS data set contains street names, cities, states, and ZIP codes that are arranged in an order
that you can use with the following BY statement:
by ZipCode;
The figure shows five BY groups each containing the BY variable ZipCode. The data set is shown with
the BY variable ZipCode printed on the left for easy reading, but the position of the BY variable in the
observations does not matter.
BY Groups for the Single BY Variable ZipCode
The first BY group contains all observations with the smallest BY value, which is 33133; the second BY
group contains all observations with the next smallest BY value, which is 33146, and so on.
BY Groups with Multiple BY Variables
The following figure represents the results of processing your data with two BY variables, State and City.
This example uses the same data set as in BY Groups with a Single BY Variable, and is arranged in an
order that you can use with the following BY statement:
by State City;
The figure shows three BY groups. The data set is shown with the BY variables State and City printed on
the left for easy reading, but the position of the BY variables in the observations does not matter.
96
BY Groups for the BY Variables State and City
The observations are arranged so that the observations for Arizona occur first. The observations within
each value of State are arranged in order of the value of City. Each BY group has a unique combination
of values for the variables State and City. For example, the BY value of the first BY group is AZ Tucson,
and the BY value of the second BY group is FL Lakeland.
Invoking BY-Group Processing
You can invoke BY-group processing in both DATA steps and PROC steps by using a BY statement. For
example, the following DATA step program uses the SET statement to combine observations from three
SAS data sets by interleaving the files. The BY statement shows how the data is ordered.
data all_sales;
set region1 region2 region3;
by State City Zip;
... more SAS statements ...
run;
This section describes BY-group processing for the DATA step. For information on BY-group processing
with procedures, see the SAS Procedures Guide.
How the DATA Step Identifies BY Groups
In the DATA step, SAS identifies the beginning and end of each BY group by creating two temporary
variables for each BY variable: FIRST.variable and LAST.variable. These temporary variables are
available for DATA step programming but are not added to the output data set. Their values indicate
whether an observation is
•
the first one in a BY group
•
the last one in a BY group
•
neither the first nor the last one in a BY group
•
both first and last, as is the case when there is only one observation in a BY group.
You can take actions conditionally, based on whether you are processing the first or the last observation
of a BY group.
When an observation is the first in a BY group, SAS sets the value of the FIRST.variable to 1. For all
other observations in the BY group, the value of the FIRST.variable is 0. Likewise, if an observation is
97
the last in a BY group, SAS sets the value of LAST.variable to 1. For all other observations in the BY
group, the value of LAST.variable is 0. If the observations are sorted by more than one BY variable, the
FIRST.variable for each variable in the BY statement is set to 1 at the first occurrence of a new value for
the variable.
This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and
end of four BY groups. Six temporary variables are created within the program data vector. These
variables can be used during the DATA step, but they do not become variables in the new data set.
In the figure that follows, observations in the SAS data set are arranged in an order that can be used with
this BY statement:
by State City ZipCode;
SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City, LAST.City,
FIRST.ZipCode, and LAST.ZipCode.
FIRST. and LAST. Values for Four BY Groups
ESEMPIO
L’esempio è riferito a un insieme di dati clinici; della maggior parte dei pazienti vengono fatte più
rilevazioni in epoche successive: la variabile “giorno” indica il conteggio dei giorni dall’inizio dello
studio clinico. La variabile “paziente” indica il numero che identifica il paziente
Il seguente programma costruisce un DSS con una variabile contenente il numero di rilevazioni per
ciascun paziente.
data numero;
set tp1.cirr_seq;
by id;
retain num_oss;
if
first.id =1 then num_oss =0;
num_oss =num_oss +1;
if last.id=1 then output;
keep id num_oss;
98
K. APPROFONDIMENTI: LETTURA DI DATI GREZZI
Abbiamo già visto tre tipi di input
• a lista
• a column
• a formato
In questi approfondimenti ne esamineremo altri due:
• a lista con formato
• con nome
K1. INPUT A LISTA CON FORMATO
È la versione più flessibile dell’input.
• La & permette di leggere variabili carattere che contengono blank (già visto)
• Il : (due punti) scritto dopo il nome della variabile permette di usare fromati (gli stessi dell’input a
formato). La differenza con l’input a formato è che in questo caso SAS legge fino a incontrare un
carattere bianco; quindi
o input x : 4.1
input a lista con formato
significa “la variabile è scritta su al più 4 colonne di cui una corrisponde alle
decine e una al separatore decimale
o input x 4.1
input a formato
significa “la variabile è scritta su esattamente 4 colonne di cui una corrisponde
alle decine e una al separatore decimale
• La ~ (tilde) fpremette di leggee e conservare le virgolette, le doppie virgolette e I delimitatori
dentro variabili carattere
ESEMPIO di uso di : e ~
data scores;
infile datalines dsd;
input Name : $9. Score1-Score3 Team ~ $25. Div $;
datalines;
Smith,12,22,46,"Green Hornets, Atlanta",AAA
Mitchel,23,19,25,"High Volts, Portland",AAA
Jones,09,17,54,"Vulcans, Las Vegas",AA
;
proc print data=scores noobs;
run;
OUTPUT SAS
Name
Score1
Score2
Score3
Team
Div
Smith
Mitchel
Jones
12
23
9
22
19
17
46
25
54
"Green Hornets, Atlanta"
"High Volts, Portland"
"Vulcans, Las Vegas"
AAA
AAA
AA
99
K2. INPUT CON NOME
Serve per leggere dati in cui i valori sono preceduti dal nome della variabile e dal segno uguale (=)
data games;
input name=$ score1= score2=;
datalines;
name=riley score1=1132 score2=1187
;
proc print data=games;
run;
K3. SOSPENSIONE DELL’
@
INPUT: USO DI @
holds an input record for the execution of the next INPUT statement within the same iteration of
the DATA step. This line-hold specifier is called trailing @.
Restriction:
The trailing @ must be the last item in the INPUT statement.
Tip:
The trailing @ prevents the next INPUT statement from automatically releasing
the current input record and reading the next record into the input buffer. It
is useful when you need to read from a record multiple times.
Example: Holding a Record in the Input Buffer
This example reads a file that contains two kinds of input data records and creates a SAS data set from
these records. One type of data record contains information about a particular college course. The second
type of record contains information about the students enrolled in the course. You need two INPUT
statements to read the two records and to assign the values to different variables that use different
formats. Records that contain class information have a C in column 1; records that contain student
information have an S in column 1, as shown here:
----+----1----+----2----+
C HIST101 Watson
S Williams 0459
S Flores
5423
C MATH202 Sen
S Lee
7085
To know which INPUT statement to use, check each record as it is read. Use an INPUT statement that
reads only the variable that tells whether the record contains class or student.
data schedule(drop=type);
infile file-specification;
retain Course Professor;
input type $ 1 @;
if type='C' then
input course $ professor $;
else if type='S' then
do;
input Name $10. Id;
output schedule;
end;
proc print; run;
The first INPUT statement reads the TYPE value from column 1 of every line. Because this INPUT
statement ends with a trailing @, the next INPUT statement in the DATA step reads the same line. The
IF-THEN statements that follow check whether the record is a class or student line before another INPUT
statement reads the rest of the line. The INPUT statements without a trailing @ release the held line. The
100
RETAIN statement saves the values about the particular college course. The DATA step writes an
observation to the SCHEDULE data set after a student record is read.
The following output that PROC PRINT generates shows the resulting data set SCHEDULE.
The SAS System
OBS
Course
Professor
1
2
3
HIST101
HIST101
MATH202
Watson
Watson
Sen
1
Name
Williams
Flores
Lee
Id
459
5423
7085
Example: Positioning the Pointer with a Numeric Variable
This example uses a numeric variable to position the pointer. A raw data file contains records with the
employment figures for several offices of a multinational company. The input data records are
----+----1----+----2----+----3----+
8
New York
1 USA 14
5
Cary
1 USA 2274
3 Chicago
1 USA 37
22 Tokyo
5 ASIA 80
5
Vancouver
2 CANADA 6
9
Milano
4 EUROPE 123
The first column has the column position for the office location. The next numeric column is the region
category. The geographic region occurs before the number of employees in that office.
You determine the office location by combining the @numeric-variable pointer control with a trailing
@. To read the records, use two INPUT statements. The first INPUT statement obtains the value for the
@ numeric-variable pointer control. The second INPUT statement uses this value to determine the
column that the pointer moves to.
data office (drop=x);
infile file-specification;
input x @;
if 1<=x<=10 then
input @x City $9.;
else do;
put 'Invalid input at line ' _n_;
delete;
end;
run;
The DATA step writes only five observations to the OFFICE data set. The fourth input data record is
invalid because the value of X is greater than 10. Therefore, the second INPUT statement does not
execute. Instead, the PUT statement writes a message to the SAS log and the DELETE statement stops
processing the observation.
101
K4. OPZIONI DI INFILE PER LEGGERE DATI CON DELIMITATORI
NELL’INPUT A LISTA
By default, the delimiter to read input data records with list input is a blank space. Both the DSD option
and the DELIMITER= option affect how list input handles delimiters. The DELIMITER= option
specifies that the INPUT statement use a character other than a blank as a delimiter for data values that
are read with list input. When the DSD option is in effect, the INPUT statement uses a comma as the
default delimiter.
To read a value as missing between two consecutive delimiters, use the DSD option. By default, the
INPUT statement treats consecutive delimiters as a unit. When you use DSD, the INPUT statement treats
consecutive delimiters separately. Therefore, a value that is missing between consecutive delimiters is
read as a missing value. To change the delimiter from a comma to another value, use the DELIMITER=
option.
For example, this DATA step program uses list input to read data that are separated with commas. The
second data line contains a missing value. Because SAS allows consecutive delimiters with list input, the
INPUT statement cannot detect the missing value.
data scores;
infile datalines delimiter=',';
input test1 test2 test3;
datalines;
91,87,95
97,,92
,1,1
;
With the FLOWOVER option in effect, the data set
SCORES contains two, not three, observations. The
second observation is built incorrectly:
To correct the problem, use the DSD option in the
INFILE statement.
Now the INPUT statement detects the two
consecutive delimiters and therefore assigns a
missing value to variable TEST 2 in the second
observation.
OBS
1
2
TEST1
91
97
TEST2
87
92
TEST3
95
1
OBS
1
2
3
TEST1
91
97
TEST2
87
.
TEST3
95
92
1
1
1
The DSD option also enables list input to read a character value that contains a delimiter within a quoted
string. For example, if data are separated with commas, DSD enables you to place the character string in
quotation marks and read a comma as a valid character. SAS does not store the quotation marks as part of
the character value. To retain the quotation marks as part of the value, use the tilde (~) format modifier in
an INPUT statement.
Example 1: Changing How Delimiters are Treated
By default, the INPUT statement uses a blank as the delimiter. This DATA step uses a comma as the delimiter:
data num;
input x y z;
datalines;
,2,3
4,5,6
7,8,9
;
The argument DATALINES in the INFILE statement allows you to use an INFILE statement option to
read in-stream data lines. The DSD option sets the comma as the default delimiter. Because a comma
precedes the first value in the first dataline, a missing value is assigned to variable X in the first
observation, and the value 2 is assigned to variable Y.
102
If the data uses multiple delimiters or a single delimiter other than a comma, simply specify the delimiter
values with the DELIMITER= option. In this example, the characters a and b function as delimiters:
data nums;
infile datalines dsd delimiter='ab';
input X Y Z;
datalines;
1aa2ab3
4b5bab6
7a8b9
;
The output that PROC PRINT generates shows the resulting NUMS data set. Values are missing for
variables in the first and second observation because DSD causes list input to detect two consecutive
delimiters. If you omit DSD, the characters a, b, aa, ab, ba, or bb function as the delimiter and no
variables are assigned missing values.
The SAS System
1
OBS
X
Y
Z
1
2
3
1
4
7
.
5
8
2
.
9
This DATA step uses modified list input and the DSD option to read data that are separated by commas
and that may contain commas as part of a character value:
data scores;
input Name : $9. Score
Team : $25. Div $;
datalines;
Joseph,76,"Red Racers, Washington",AAA
Mitchel,82,"Blue Bunnies, Richmond",AAA
Sue Ellen,74,"Green Gazelles, Atlanta",AA
;
The output that PROC PRINT generates shows the resulting SCORES data set. The delimiter (comma) is
stored as part of the value of TEAM while the quotation marks are not. The folowing output shows how
to use the tilde (~) format modifier in an INPUT statement to retain the quotation marks in character data.
OBS
1
2
3
NAME
Joseph
Mitchel
Sue Ellen
SCORE
76
82
74
TEAM
Red Racers, Washington
Blue Bunnies, Richmond
Green Gazelles, Atlanta
103
DIV
AAA
AAA
AA
K5. Valori Missing particolari
Is a type of numeric missing value that enables you to represent different categories of missing data by
using the letters A-Z or an underscore.
Example
The following example uses data from a marketing research company. Five testers were hired to test five
different products for ease of use and effectiveness. If a tester was absent, there is no rating to report, and
the value is recorded with an X for "absent." If the tester was unable to test the product adequately, there
is no rating, and the value is recorded with an I for "incomplete test." The following program reads the
data and displays the resulting SAS data set. Note the special missing values in the first and third data
lines:
data period_a;
missing X I;
input Id $4. Foodpr1 Foodpr2 Foodpr3 Coffeem1 Coffeem2;
datalines;
1001 115 45 65 I 78
1002 86 27 55 72 86
1004 93 52 X 76 88
1015 73 35 43 112 108
1027 101 127 39 76 79
;
proc print data=period_a;
title 'Results of Test Period A';
footnote1 'X indicates TESTER ABSENT';
footnote2 'I indicates TEST WAS INCOMPLETE';
run;
The following output is produced:
Obs
1
2
3
4
5
Id
1001
1002
1004
1015
1027
Results of Test Period A
Foodpr1
Foodpr2
Foodpr3
115
86
93
73
101
45
27
52
35
127
Coffeem1
Coffeem2
I
72
76
112
76
78
86
88
108
79
65
55
X
43
39
X indicates TESTER ABSENT
I indicates TEST WAS INCOMPLETE
104
L. APPROFONDIMENTI: FORMATI DI LETTURA E
SCRITTURA DEI DATI
L1.
L2.
L3.
L4.
L5.
ISTRUZIONE FORMAT
ISTRUZIONE INFORMAT
ISTRUZIONE LENGTH
ISTRUZIONE ATTRIB
PROC FORMAT
L5.1. Istruzione VALUE
L5.2. Istruzione INVALUE
L5.3. Istruzione PICTURE
L5.4. Alcuni esempi di cambio di formati
L5.5. Funzioni di conversione da variabile carattere a numerica e viceversa
L7. ALCUNE FUNZIONI DI ARROTONDAMENTO
L8. ALCUNE FUNZIONI SULLE VARIABILI CARATTERE
Associating Informats and Formats with Variables
Step
Informats
Formats
In
a Use the ATTRIB or INFORMAT
DATA statement to permanently associate an
step
informat with a variable. Use the
INPUT function or INPUT statement to
associate the informat with the variable
only for the duration of the DATA step.
Use the ATTRIB or FORMAT statement to
permanently associate a format with a variable. Use
the PUT function or PUT statement to associate the
format with the variable only for the duration of the
DATA step.
In
a The ATTRIB and INFORMAT
PROC
statements are valid in base SAS
step
procedures. However, in base SAS
software, typically you do not assign
informats in PROC steps because the
data have already been read into SAS
variables.
Use the ATTRIB statement or the FORMAT
statement to associate formats with variables. If you
use either statement in a procedure that produces an
output data set, the format is permanently associated
with the variable in the output data set. If you use
either statement in a procedure that does not
produce an output data set, the statement associates
the format with the variable only for the duration of
the PROC step.
COSTRUZIONE DI UN DATA SET SAS
DI ESEMPIO
data pippo2;
input a $ x;
datalines;
mnopqr 55
fghilm 52
mnopqr 53
;
105
L1. ISTRUZIONE FORMAT
Un FORMAT è un’istruzione che SAS usa per scrivere i valori dei dati. Si usa l’istruzione FORMAT per
controllare l’aspetto dei dati o, in alcuni casi, per raggruppare i dati da analizzare. Per esempio, il format
WORDS22, che converte i valori numerici nella corrispondente versione in lettere (in inglese), scrive il
valore numerico 692 come six hundred ninety-two.
<$>format<w>.<d>
La sintassi è
L1.1 - Ambiente: passo di data
1-modifica del formato di variabile numerica
Obs
data pluto;
set pippo2;
format x 5.1;
a
1
2
3
x
mnopqr
fghilm
mnopqr
55.0
52.0
53.0
The CONTENTS Procedure
...
--Alphabetic List of Variables and Attributes-#
Variable
Type
Len
Pos
Format
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
a
Char
8
8
2
x
Num
8
0
5.1
2-modifica del formato di una variabile carattere
Obs
data pluto2;
set pippo2;
format a $3.;
a
1
2
3
...
--Alphabetic List of Variables and Attributes--
x
mno
fgh
mno
55
52
53
#
Variable
Type
Len
Pos
Format
1
a
Char
8
8
$3.
2
x
Num
8
0
OSSERVAZIONE:
In entrambi i casi la lunghezza della variabile rimane quella originale.
L2.2 - Ambiente: passo di proc
1-modifica del formato di variabile numerica
data pluto;
set pippo2;
proc print;
format x 5.1;
proc contents;
run;
Obs
a
1
2
3
#
Variable
Type
Len
Pos
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
a
Char
8
8
2
x
Num
8
0
x
mnopqr
fghilm
mnopqr
55.0
52.0
53.0
2-modifica del formato di una variabile carattere
data pluto2;
set pippo2;
proc print;
format a $3.;
proc contents;
run;
Obs
1
2
3
a
mno
fgh
mno
x
#
Variable
Type
Len
Pos
55
52
53
1
2
a
x
Char
Num
8
8
8
0
106
L2. ISTRUZIONE INFORMAT
Un INFORMAT è un’istruzione che SAS usa per leggere e assegnare a una variabile i valori dei dati.
Per esempio il seguente valore contiene un segno di dollaro e alcune virgole:
$1,000,000
Per rimuovere il segno di dollaro ($) e le virgole (,) prima di memorizzare il valore numerico 1000000 in
una variabile, bisogna leggere questo valore con l’informat COMMA11. .
Se non si definisce esplicitamente una variabile precedentemente, SAS usa l’informat per determinare se
la variabile è numerica o carattere e per determinare la lunghezza di una variabile carattere.
<$>informat<w>.<d
>
Informat per categoria
CHARACTER
COLUMN-BINARY
instructs SAS to read character data values into character variables.
instructs SAS to read data stored in column-binary or multipunched form into
character and numeric variables.
instructs SAS to read data values into variables that represent dates, times, and
datetimes.
instructs SAS to read numeric data values into numeric variables.
instructs SAS to read data values by using an informat that is created with an
INVALUE statement in PROC FORMAT.
DATE and TIME
NUMERIC
USER-DEFINED
Ambiente: passo di data
1-modifica dell’informat di variabile numerica
data pluto;
set pippo2;
informat
5.1;
Obs
...
----Alphabetic List of Variables and Attributes----
a
x
x
55
1
mnopqr
2
fghilm
3
mnopqr
#
Variable
Type
Len
Pos
Informat
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
a
Char
8
8
2
x
Num
8
0
5.1
52
53
2-modifica del formato di variabile carattere
data pluto2;
set pippo2;
informat a $3.;
Obs
1
2
3
a
mnopqr
fghilm
mnopqr
x
55
52
53
...
----Alphabetic List of Variables and Attributes----#
Variable
Type
Len
Pos
Informat
ƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
a
Char
8
8
$3.
2
x
Num
8
0
OSSERVAZIONE: In entrambi i casi la lunghezza della variabile rimane quella originale.
107
ALCUNE DEI PRINCIPALI INFORMAT E FORMAT
PER LE VARIABILI CARATTERE, NUMERICHE E PER DATE E TEMPI
Categories and Descriptions of Informats
Category
Informat
Description
Character
$CHARw.
Reads/Writes character data with blanks
$CHARZBw.
Converts binary 0s to blanks
$QUOTEw.
Removes matching quotation marks from character data
$REVERJw.
Reads/Writes character data from right to left and preserves blanks
$REVERSw.
Reads/Writes character data from right to left and left aligns
$UPCASEw.
Converts character data to uppercase
$VARYINGw. Reads/Writes character data of varying length
Date and Time
$w.
Reads/Writes standard character data
DATEw.
Reads/Writes date values in the form ddmmmyy or ddmmmyyyy
DATETIMEw. Reads/Writes datetime values in the form ddmmmyy hh:mm:ss.ss or
ddmmmyyyy hh:mm:ss.ss
Numeric
DDMMYYw.
Reads/Writes date values in the form ddmmyy or ddmmyyyy
MMDDYYw.
Reads/Writes date values in the form mmddyy or mmddyyyy
MONYYw.
Reads/Writes month and year date values in the form mmmyy or mmmyyyy
TIMEw.
Reads/Writes hours, minutes, and seconds in the form hh:mm:ss.ss
YYMMDDw.
Reads/Writes date values in the form yymmdd or yyyymmdd
YYMMNw.
Reads/Writes date values in the form yyyymm or yymm
YYQw.
Reads/Writes quarters of the year
COMMAw.d
Removes embedded characters
COMMAXw.d Removes embedded characters
Ew.d
Reads/Writes numeric values that are stored in scientific notation and doubleprecision scientific notation
FLOATw.d
Reads/Writes a native single-precision, floating-point value and divides it by
10 raised to the dth power
HEXw.
Converts hexadecimal positive binary values to either integer (fixed-point) or
real (floating-point) binary values
NUMXw.d
Reads/Writes numeric values with a comma in place of the decimal point
PERCENTw.d
Reads/Writes percentages as numeric values
w.d
Reads/Writes standard numeric data
YENw.d
Removes embedded yen signs, commas, and decimal points
ZDw.d
Reads/Writes zoned decimal data
ZDBw.d
Reads/Writes zoned decimal data in which zeros have been left blank
ZDVw.d
Reads/Writes and validates zoned decimal data
108
L3. ISTRUZIONE LENGTH
Specifica il numero di byte che SAS usa per immaganizzare i valori delle variabili. Si può usare solo in un
passo di Data.
La sintassi è:
LENGTH
<variable-1><...variable-n>
<DEFAULT=n>;
<$>
<length>
Per modificare la lunghezza di una variabile di un DSS già esistente bisogna mettere
l’istruzione prima che il DSS sia dichiarato (con set, merge, ...).
Ambiente: passo di data
1-modifica modifica della lunghezza di una variabile carattere
data pluto2;
length a $3.;
set pippo2;
...
#
Variable
Type
Len
Pos
1
a
Char
3
8
2
x
Num
8
0
2-modifica del formato di variabile carattere
data pluto2;
set pippo2;
length a $3.;
...
#
Variable
Type
Len
Pos
1
a
Char
8
8
2
x
Num
8
0
109
L4. ISTRUZIONE ATTRIB
Associates a format, informat, label, and/or length with one or more variables
Syntax
ATTRIB variable-list(s) attribute-list(s)
;
Arguments
variable-list
names the variables that you want to associate with the attributes.
List the variables in any form that SAS
Tip:
allows.
attribute-list
specifies one or more attributes to assign to variable-list. Specify one or more of these attributes
in the ATTRIB statement:
FORMAT=format
associates a format with variables in variable-list.
The format can be either a standard SAS format or a format that is defined with the
Tip:
FORMAT procedure.
INFORMAT=informat
associates an informat with variables in variable-list.
Tip:
The informat can be either a standard SAS informat or an informat that is defined
with the FORMAT procedure.
LABEL='label'
associates a label with variables in variable-list.
LENGTH=<$>length
specifies the length of variables in variable-list.
Requirement Put a dollar sign ($) in front of the length of character variables.
:
Tip:
Use the ATTRIB statement before the SET statement to change the length of
variables in an output data set when you use an existing data set as input.
Range:
For character variables, the range is 1 to 32,767 for all operating environments.
Operating Environment Information: For numeric variables, the minimum
length you can specify with the LENGTH= specification is 2 in some operating
environments and 3 in others.
Details
The Basics
Using the ATTRIB statement in the DATA step permanently associates attributes with variables by
changing the descriptor information of the SAS data set that contains the variables.
You can use ATTRIB in a PROC step, but the rules are different.
110
How SAS Treats Variables when You Assign Informats with the INFORMAT= Option
on the ATTRIB Statement
Informats that are associated with variables by using the INFORMAT= option on the ATTRIB statement
behave like informats that are used with modified list input. SAS reads the variables by using the
scanning feature of list input, but applies the informat. In modified list input, SAS
•
does not use the value of w in an informat to specify column positions or input field widths in an
external file
•
uses the value of w in an informat to specify the length of previously undefined character
variables
•
ignores the value of w in numeric informats
•
uses the value of d in an informat in the same way it usually does for numeric informats
•
treats blanks that are embedded as input data as delimiters unless you change their status with a
DELIMITER= option specification in an INFILE statement.
If you have coded the INPUT statement to use another style of input, such as formatted input or column
input, that style of input is not used when you use the INFORMAT= option on the ATTRIB statement.
Comparisons
You can use either an ATTRIB statement or an individual attribute statement such as FORMAT,
INFORMAT, LABEL, and LENGTH to change an attribute that is associated with a variable.
Examples
Here are examples of ATTRIB statements that contain
•
single variable and single attribute:
attrib cost length=4;
•
single variable with multiple attributes:
attrib saleday informat=mmddyy.
format=worddate.;
•
multiple variables with the same multiple attributes:
attrib x y length=$4 label='TEST VARIABLE';
•
multiple variables with different multiple attributes:
attrib x length=$4 label='TEST VARIABLE'
y length=$2 label='RESPONSE';
•
variable list with single attribute:
attrib month1-month12
label='MONTHLY SALES';
111
L5. PROC FORMAT
L5.1. Istruzione VALUE
Si costruisce una etichetta permanente nella sessione aperta
L5.1a - Ambiente: passo di data
proc format;
value P low-53='basso' 53-high='alto';
/* NB: `basso’ per valori <=53 */
value $ C 'mnopqr'=0 'fghilm'=1;
data pluto3;
set pippo2;
format x P. a C.;
Obs
a
x
1
2
3
0
1
0
alto
basso
basso
#
Variable
Type
Len
Pos Format
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
a
Char
8
8
$C.
2
x
Num
8
0
P.
L5.1b - Ambiente: passo di proc
proc format;
value P low-53='basso' 53-high ='alto';
/* NB: ‘basso’ per valori <=53 */
value $ C 'mnopqr'=0 'fghilm'=1;
Obs
a
x
1
2
3
0
1
0
alto
basso
basso
#
Variable
Type
Len
Pos
1
a
Char
8
8
2
x
Num
8
0
data pluto3;
set pippo2; run;
proc print;
format x P. a $C.;
run;
proc contents;run;
112
L5.2. Istruzione INVALUE
1 - da carattere a numero
proc format;
invalue fb 'm' =1 'f'=2;
data pippo;
input a fb. x;
datalines;
m 55
f 52
m 53
;
run;
Obs
a
x
1
2
3
1
2
1
55
52
53
...
Variable
Type
Len
Pos
1
a
Num
8
0
2
x
Num
8
8
2 - da numero a carattere
proc format;
invalue $ fc low-53='basso'
53-high='alto';
data pippo;
length x $ 6;
input a $ x $ fc.;
datalines;
m 55
f 52
m 53
;
run;
Obs
a
x
1
2
3
m
f
m
alto
basso
basso
...
Variable
Type
Len
Pos
1
a
Char
8
0
2
x
Char
6
8
113
L5.3. Istruzione PICTURE (SCRITTURA DI VARIABILI NUMERICHE )
data pippo;
input a b: ddmmyy. c;
datalines;
21111.5 12/05/01 1213344
3.8 13/12/02 1223
;
proc format;
picture separa low-high='000,000.000' (dig3sep=',' decsep='.') ;
picture sep low-high='000@000&000' (dig3sep='@' decsep='&') ; /*un po strano*/
picture data (default=20) low-high='%A,%d/%m/%Y'(datatype=date);
picture doll low-high='000,000,000' (dig3sep=','prefix='$ ');
run;
la procedura con picture si richiama nella solita maniera, indifferentemente dall’ambiente in
cui si lavora:
format a separa. b data. c doll.;
L’outputdi
proc print;
format a separa. b data. c doll.;
run;
è il seguente:
Obs
1
2
a
21,111.500
3.800
b
Saturday,12/5/2001
Friday,13/12/2002
c
$ 1,213,344
$ 1,223
e la proc contents, a seconda se la formattazione sia stata effettuata in un PASSO DI PROC o
in un PASSO DI DATA , dà i seguenti risultati:
#
Variable
Type
Len
Pos
1
a
Num
8
0
2
b
Num
8
8
3
c
Num
8
16
#
Variable
Type
Len
Pos
Format
1
a
Num
8
0
SEPARA.
2
b
Num
8
8
DATA.
3
c
Num
8
16
DOLL.
La sintassi dell’istruzione PICTURE è:
PICTURE name <(format-option(s))> <value-range-set-1 <(picture-1-option(s) )>
<...value-range-set-n <(picture-n-option(s))>>>;
e si usano le seguenti opzioni:
To do this
Control the attributes of the format
Use this option
Specify a fuzz factor for matching values to a range
DEFAULT=
Specify a fuzz factor for matching values to a range
FUZZ=
Specify a maximum length for the format
MAX=
Specify a minimum length for the format
MIN=
Specify multiple pictures for a given value or range and for overlapping MULTILABEL
ranges
Store values or ranges in the order that you define them
NOTSORTED
Round the value to the nearest integer before formatting
114
ROUND
Control the attributes of each picture in the format
Specify a character that completes the formatted value
FILL=
Specify a number to multiply the variable's value by before it is formatted
MULTIPLIER=
Specify that numbers are message characters rather than digit selectors
NOEDIT
Specify a character prefix for the formatted value
PREFIX=
Si possono poi usare degli argomenti; riportiamo quelli usati nell’esempio :
DECSEP='character' specifies the separator character for the fractional part of a number.
Default:
.
(a
point)
decimal
DIG3SEP='character' specifies the three-digit separator character for a number.
Default:
, (a comma)
PREFIX='prefix'
specifies a character prefix to place in front of the value's first significant digit. You must use zero
digit selectors or the prefix will not be used.
The picture must be wide enough to contain both the value and the prefix. If the picture is not
wide enough to contain both the value and the prefix, the format truncates or omits the prefix.
Typical uses for PREFIX= are printing leading dollar signs and minus signs. For example, the
PAY. format prints the variable value 25500 as $25,500.00:
picture pay low-high='000,009.99'
(prefix='$');
Default:
no prefix
Interaction: If you use the FILL= and PREFIX= options in the same picture, the format places
the prefix and then the fill characters.
DATATYPE=DATE | TIME | DATETIME
specifies that you can use directives in the picture as a template to format date, time, or datetime
values.
%a
%A
%b
%B
%d
%H
%j
%m
%M
%y
%Y
%U
con delle direttive delle quali le principali sono :
Locale's abbreviated weekday name
Locale's full weekday name
Locale's abbreviated month name
Locale's full month name
Day of the month as a decimal number (1-31), with no leading zero
Hour (24-hour clock) as a decimal number (0-23), with no leading zero
Day of the year as a decimal number (1-366), with no leading zero
Month as a decimal number (1-12), with no leading zero
Minute as a decimal number (0-59), with no leading zero
Year without century as a decimal number (0-99), with no leading zero
Year with century as a decimal number
Week number of the year (Sunday as the first day of the week) as a decimal number
(0,53), with no leading zero
115
L5.4. Riassunto: alcuni esempi di cambio di formati
A. CAMBIO DI FORMATO IN INPUT
1 - da carattere a numero
proc format;
invalue fb 'm' =1 'f'=2;
data pippo;
input a fb. x;
datalines;
m 55
f 52
m 53
;
run;
Obs
a
x
1
2
3
1
2
1
55
52
53
...
Variable
Type
Len
Pos
1
a
Num
8
0
2
x
Num
8
8
2
- da numero a carattere
proc format;
invalue
$
fc
low-53='basso'
high='alto';
data pippo;
input a $ x $ fc.;
length x $ 6;
datalines;
m 55
f 52
m 53
;
run;
53-
Obs
a
x
1
2
3
m
f
m
alto
basso
basso
...
Variable
Type
Len
Pos
1
a
Char
8
0
2
x
Char
5
8
116
B. CAMBIO DI FORMATO LEGGENDO I DATI DA UN DSS
data pippo2;
input a $ x;
datalines;
mnopqr 55
fghilm 52
mnopqr 53
;
1-cambio di formato di variabile numerica
data pluto;
set pippo2;
attrib x format= 5.1;
Obs
1
2
3
a
x
mnopqr
fghilm
mnopqr
55.0
52.0
53.0
...
Variable
Type
Len
Pos
Format
1
a
Char
8
8
2
x
Num
8
0
5.1
2-cambio di formato di variabile carattere
data pluto2;
set pippo2;
attrib a format= $3.;
Obs
1
2
3
a
mno
fgh
mno
x
55
52
53
...
Variable
Type
Len
Pos
Format
ƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
a
Char
8
8
$3.
2
x
Num
8
0
3-cambio di formato, ma NON DI TIPO...è come una etichetta permanente nella
sessione aperta
data pluto2;
set pippo2;
attrib a format= $3.;
Obs
a
x
1
2
3
0
1
0
alto
basso
basso
...
Variable
Type
Len
Pos
Format
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
a
Char
8
8
$C.
2
x
Num
8
0
P.
117
L5.5. FUNZIONI DI CONVERSIONE DA VARIABILE CARATTERE A
NUMERICA E VICEVERSA
PUT
Returns a value using a specified format
PUT(source, format.)
Syntax
Arguments
source
identifies the SAS variable or constant whose value you want to reformat. The source argument can
be character or numeric.
format.
contains the SAS format that you want applied to the variable or constant that is specified in the
source. To override the default alignment, you can add an alignment specification to a format:
-L
left aligns the value.
-C
centers the value.
-R
right aligns the value.
Restriction:
The format. must be of the same type as the source, either character or numeric.
Details
The format must be the same type (numeric or character) as the value of source. The result of the PUT
function is always a character string. If the source is numeric, the resulting string is right aligned. If the
source is character, the result is left aligned.
Use PUT to convert a numeric value to a character value. PUT writes (or produces a reformatted result)
only while it is executing. To preserve the result, assign it to a variable.
Comparisons
The PUT statement and the PUT function are similar. The PUT function returns a value using a specified
format. You must use an assignment statement to store the value in a variable. The PUT statement writes
a value to an external destination (either the SAS log or a destination you specify).
INPUT
Syntax
Returns the value produced when a SAS expression that uses a specified informat expression
is read
INPUT(source, <? | ??>informat.)
Arguments
source contains the SAS character expression to which you want to apply a specific informat.
? or ??
The optional question mark (?) and double question mark (??) format modifiers suppress the
printing of both the error messages and the input lines when invalid data values are read. The ?
modifier suppresses the invalid data message. The ?? modifier also supresses the invalid data
message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when
invalid data are read.
informat.
is the SAS informat that you want to apply to the source.
Details
The INPUT function enables you to read the value of source by using a specified informat. The informat
determines whether the result is numeric or character. Use INPUT to convert character values to numeric
values.
118
Comparisons
The INPUT function returns the value produced when a SAS expression is read using a specified
informat. You must use an assignment statement to store that value in a variable. The INPUT statement
uses an informat to read a data value and then optionally stores that value in a variable.
Examples
Example 1: Converting Numeric Values to Character Value
In this example, the first statement converts the values of CC, a numeric variable, into the four-character
hexadecimal format, and the second writes the same value that the PUT function returns.
cchex=put(cc,hex4.);
put cc hex4.;
Example 2: Converting Character Values to Numeric Values
This example uses the INPUT function to convert a character value to a numeric value and store it in
another variable. The COMMA9. informat reads the value of the SALE variable, stripping the commas.
The resulting value, 2115353, is stored in FMTSALE.
data testin;
input sale $9.;
fmtsale=input(sale,comma9.);
datalines;
2,115,353
;
Example 2: Using PUT and INPUT Functions
In this example, PUT returns a numeric value as a character string. The value 122591 is assigned to the
CHARDATE variable. INPUT returns the value of the character string as a SAS date value using a SAS
date informat. The value 11681 is stored in the SASDATE variable.
numdate=122591;
chardate=put(numdate,z6.);
sasdate=input(chardate,mmddyy6.);
119
Definitions
SAS date value
is a value that represents the number of days between January 1, 1960, and a specified date. SAS
can perform calculations on dates ranging from A.D. 1582 to A.D. 19,900. Dates before January
1, 1960, are negative numbers; dates after are positive numbers.
• SAS date values account for all leap year days, including the leap year day in the year 2000.
• SAS date values can reliably tell you what day of the week a particular day fell on as far back
as September 1752, when the calendar was adjusted by dropping several days. SAS day-ofthe-week and length-of-time calculations are accurate in the future to A.D. 19,900.
• Various SAS language elements handle SAS date values: functions, formats and informats.
SAS time value
is a value representing the number of seconds since midnight of the current day. SAS time values
are between 0 and 86400.
SAS datetime value
is a value representing the number of seconds between January 1, 1960 and an
hour/minute/second within a specified date.
The following figure shows some dates written in calendar form and as SAS date values.
How SAS Converts Calendar Dates to SAS Date Values
Two-Digit and Four-Digit Years
SAS software can read two-digit or four-digit year values. If SAS encounters a two-digit year, the
YEARCUTOFF= option can be used to specify which century within a 100 year span the two-digit year
should be attributed to. For example, YEARCUTOFF=1950 means that two-digit years 50 through 99
correspond to 1950 through 1999, while two-digit years 00 through 49 correspond to 2000 through 2049.
Note that while the default value of the YEARCUTOFF= option in Version 8 of the SAS System is 1920,
you can adjust the YEARCUTOFF= value in a DATA step to accomodate the range of date values you
are working with at the moment. To correctly handle 2-digit years representing dates between 2000 and
2099, you should specify an appropriate YEARCUTOFF= value between 1901 and 2000.
The Year 2000
SAS software treats the year 2000 like any other leap year. If you use two-digit year numbers for dates,
you'll probably need to adjust the default setting for the YEARCUTOFF= option to work with date
ranges for your data, or switch to four-digit years. The following program changes the YEARCUTOFF=
value to 1950. This change means that all two digit dates are now assumed to fall in the 100-year span
from 1950 to 2049.
options yearcutoff=1950;
data _null_;
a='26oct02'd;
put 'SAS date='a;
put 'formatted date='a date9.;
run;
The PUT statement writes the following lines to the SAS log:
SAS date=15639
formated date=26OCT2002
120
Working with SAS Dates and Times
Informats and Formats
The SAS System converts date, time and datetime values back and forth between calendar dates and
clock times with SAS language elements called formats and informats.
• Formats present a value, recognized by SAS, such as a time or date value, as a calendar date or
clock time in a variety of lengths and notations.
• Informats read notations or a value, such as a clock time or a calendar date, which may be in a
variety of lengths, and then convert the data to a SAS date, time, or datetime value.
Date and Time Tools by Task
The following table correlates tasks with various SAS System language elements that are available for
working with time and date data.
To write SAS date values in recognizable forms use this DATE FORMATS (PRINCIPALI)
List
Input
Result
List
Input
Result
DATEw.
14686
17MAR00
MMDDYYw.
14686
03/17/00
DAYw.
14686
17
MMDDYY10.
14686
03/17/2000
DDMMYYw.
14686
17/03/00
MMDDYYBw.
14686
03 17 00
DDMMYY10.
14686
17/03/2000
MMDDYYB10.w. 14686
DDMMYYBw.
14686
17 03 00
MMDDYYCw.
14686
03:17:00
DDMMYYB10.
14686
17 03 2000
MMDDYYC10
14686
03:17:2000
DDMMYYCw.
14686
17:03:20
MMDDYYDw.
14686
03-17-00
DDMMYYC10.
14686
17:03:2000
MMDDYYD10.
14686
03-17-2000
DDMMYYDw.
14686
17-03-00
MMDDYYS
14686
03/17/00
DDMMYYD10.
14686
17-03-2000
MMDDYYS10.
14686
03/17/2000
DDMMYYNw.
14686
17MAR00
MMYY.xw.
14686
03M2000
DDMMYYN10
14686
17MAR2000
MMYYCw.
14686
03:2000
DDMMYYPw.
14686
17.03.00
MMYYD.
14686
03-2000
DDMMYYP10.
14686
17.03.2000
MMYYN.
14686
032000
DDMMYYSw.
14686
17/03/00
MMYYP.
14686
03.2000
DDMMYYS10.
14686
17/03/2000
MMYYS.
14686
03/2000
DOWNAME.
14686
Friday
MONNAME.
14686
March
EURDFDEw.
14686
17MAR00
MONTH.
14686
3
EURDFDE9.
14686
17MAR2000
MONYY.
14686
MAR2000
EURDFDNw.
14686
5
TIMEw.d
14686
4:04:46
EURDFDWNw.
14686
Friday
TIMEAMPMw.d
14686
4:04:46 AM
EURDFMYw.
14686
MAR00
TOD
14686
4:04:46
EURDFDMY7
14686
MAR2000
WEEKDATEw.
14686
Friday, March 17, 2000
EURDFWDXw.
14686
17MAR2000
WEEKDAYw.
14686
6
EURDFMNw.
14686
March
WORDDATE.w.
14686
March 17, 2000
03 17 2000
14686 Friday, 17 MAR 2000
EURDFWKXw.
Per i giorni della settimana in italiano ITADFDWNw. In francese FRADFDWNw.
121
To read calendar dates as SAS date use this DATE INFORMATS (PRINCIPALI) Note: YEARCUTOFF=1920
MMDDYY10.
03172000
14686
List
Input
Result
17MAR2000
-14534
MAR00
14670
DATEw.
MONYYw.
DATE9.
17MAR2000
14686
H.12/03/17
14686
NENGOw.
170300
14686
000317
14686
DDMMYYw.
YYMMDDw.
DDMMYY8.
17032000
14686
YYMMDD10.
20000317
14686
031700
14686
00Q1
14610
MMDDYYw.
YYQw.
To do this ...
Use this .
Extract a date from a datetime Date functions
value
Return today's date as a SAS date Date functions
Extract calendar dates from SAS Date functions
Write a date as a constant in an SAS date constant
expression
Time Tasks
Write SAS time values as time time formats
values
Read time values as SAS time Time informats
values
Return the current time of day as a Time functions
SAS time value
Return the time part of a SAS Time functions
datetime value
Datetime Tasks
Write SAS datetime values as Datetime formats
datetime values
List
Input
Result
DATEPART
'17MAR00:00:00 'DT
14686
DATE() or TODAY()
DAY
HOUR
MINUTE
MONTH
WEEKDAY
YEAR
'ddmmmyy'd
or
'ddmmmyyyy'
()
14686
14686
14686
14686
14686
14686
'17mar00'd
'17mar2000'd
SAS date for today
17
4
4
3
6
2000
14686
HHMM.
53132
14:46
HOUR.
MMSS.
TIME.
TOD.
TIME
53132
53132
53132
53132
14:45:32
15
885
14:45:32
14:45:32
53132
TIME( )
()
TIMEPART
DATEAMPM
1217083532
26JUL98:02:45 PM
DATETIME
1268870400
17MAR00:00:00
:00
26JUL98:14:45:32
1268870400
EURDFDT
Read datetime values as SAS Datetime informats DATETIME
datetime values
Return the current date and time of Datetime functions DATETIME()
day as a SAS datetime value
Interval Tasks
Return the number of specified
Interval functions
time intervals that lie between the
two date or datetime values
Advances a date, time, or datetim Interval functions
value by a given interval, and return
a date, time, or datetime value
SAS time value at
moment
of
execution
in
NNNNN.NN
SAS datetime value in SAS time value part
NNNNNNNNNN.N
of date value in
NNNNN.NN
1217083532
17MAR00:00:00:00
()
SAS datetime value
at
moment
of
execution
in
NNNNNNNNNN.N
INTCK
week 2 01aug60
01jan01
1055
INTNX
day 14086
01jan60
14086
Examples
122
Example 1: Displaying Date, Time, and Datetime Values as Recognizable Dates and Times
The following example demonstrates how a value may be displayed as a date, a time, or a datetime.
Remember to select the SAS language element that converts a SAS date, time, or datetime value to the
intended date, time or datetime format. See the previous tables for examples.
Note:
Time formats count the number of seconds within a day, so the values will be between 0 and
86400.
• DATETIME formats count the number of seconds since January 1, 1960, so for datetimes that are
greater than 02JAN1960:00:00:01, (integer of 86401) the datetime value will always be greater
than the time value.
• When in doubt, look at the contents of your data set for clues as to which type of value you are
dealing with.
This program uses the DATETIME, DATE and TIMEAMPM formats to display the value 86399 to a
date and time, a calendar date, and a time.
•
data test;
options nodate pageno=1 linesize=80 pagesize=60;
Time1=86399;
format Time1 datetime.;
Date1=86399;
format Date1 date.;
Time2=86399;
format Time2 timeampm.;
run;
title 'Same Number, Different SAS Values';
footnote1 'Time1 is a SAS DATETIME value';
footnote2 'Date1 is a SAS DATE value';
footnote3 'Time2 is a SAS TIME value'.;
run;
Datetime, Date and Time Values for 86399
Same Number, Different SAS Values
Obs
Time1
Date1
Time2
1
01JAN60:23:59:59
20JUL96
11:59:59 PM
1
Time1 is a SAS DATETIME value
Date1 is a SAS DATE value
Time2 is a SAS TIME value.
Example 2: Reading, Writing, and Calculating Date Values
This program reads four regional meeting dates and calculates the dates on which announcements should
be mailed.
data meeting;
options nodate pageno=1 linesize=80 pagesize=60;
input region $ mtg : mmddyy8.;
sendmail=mtg-45;
datalines;
N 11-24-99
S 12-28-99
E 12-03-99
W 10-04-99
;
proc print data=meeting;
format mtg sendmail date9.;
title 'When To Send Announcements';
run;
Calculated Date Values: When to Send Mail
123
Obs
1
2
3
4
When To Send Announcements
region
mtg
sendmail
N
24NOV1999
10OCT1999
S
28DEC1999
13NOV1999
E
03DEC1999
19OCT1999
W
04OCT1999
20AUG1999
ALCUNE DELLE PRINCIPALI FUNZIONI PER DATE E TEMPI
MINUTE
Returns the minute from a SAS time or datetime value
Function
Description
DATDIF
Returns the number of days between two MONTH
dates
Returns the month from a SAS date value
DATE
Returns the current date as a SAS date QTR
value
Returns the quarter of the year from a SAS date value
DATEPART Extracts the date from a SAS datetime SECOND
Returns the second from a SAS time or datetime value
value
DATETIME Returns the current date and time of day TIME
Returns the current time of day
as a SAS datetime value
DAY
Returns the day of the month from a SAS TIMEPART Extracts a time value from a SAS datetime value
date value
DHMS
Returns a SAS datetime value from date, TODAY
hour, minute, and second
HMS
Returns a SAS time value from hour, WEEKDAY Returns the day of the week from a SAS date value
minute, and second values
HOUR
Returns the hour from a SAS time or YEAR
datetime value
Returns the year from a SAS date value
MDY
Returns a SAS date value from month, YRDIF
day, and year values
Returns the difference in years between two dates
Returns the current date as a SAS date value
DATDIF Returns the number of days between two dates
DATDIF(sdate,edate,basis)
Arguments
sdate specifies a SAS date value that identifies the starting date.
edate specifies a SAS date value that identifies the ending date.
basis identifies a character constant or variable that describes how SAS calculates the date difference:
'30/360' specifies a 30 day month and a 360 day year. Each month is considered to have 30 days,
and each year 360 days, regardless of the actual number of days in each month or year.
'ACT/ACT' (or 'Actual')uses the actual number of days between dates.
Examples
In the following example, DATDIF returns the actual number of days between two dates, and the number
of days based on a 30-month and 360-day year.
data _null;
sdate='16oct78'd;
edate='16feb96'd;
actual=datdif(sdate, edate, 'act/act');
days360=datdif(sdate, edate, '30/360');
put actual= days360=;run;
SAS Statements Results
put actual=;
put days360=;
124
6332
6240
L7. ALCUNE FUNZIONI DI ARROTONDAMENTO
ROUND Rounds to the nearest roundoff unit
Syntax
ROUND(argument,round-off-unit)
SAS Statement
Results
var1=223.456;
x=round(var1,1);
put x 9.5;
223.00000
var2=223.456;
x=round(var2,.01);
put x 9.5;
223.46000
x=round(223.456,100);
put x 9.5;
200.00000
x=round(223.456);
put x 9.5;
223.00000
x=round(223.456,.3);
put x 9.5;
223.50000
FLOOR
Returns the largest integer that is
less than or equal to the argument
SAS Statements
Syntax
FLOOR(argument)
var1=2.1;
a=floor(var1);
put a;
var2=-2.4;
b=floor(var2);
put b;
c=floor(3);
put c;
d=floor(-1.6);
put d;
e=floor(1.-1.e-13);
put e;
INT
Syntax
Returns the integer value
INT(argument)
SAS Statement
var1=2.1;
x=int(var1);
put x=;
var2=-2.4;
y=int(var2);
put y=;
a=int(3);
put a=;
b=int(-1.6);
put b=;
125
Results
2
-3
3
-2
1
Results
2
-2
3
-1
L8. ALCUNE FUNZIONI SULLE VARIABILI CARATTERE
SUBSTR (right of =) Extracts a substring from an argument
Syntax
<variable=>SUBSTR(argument,position<,n>)
Arguments
variable
argument
position
n
specifies a valid SAS variable name.
specifies any SAS character expression.
specifies a numeric expression that is the beginning character position.
specifies a numeric expression that is the length of the substring to extract.
If n is larger than the length of the expression that remains in argument after position,
SAS extracts the remainder of the expression.
Tip:
If you omit n, SAS extracts the remainder of the expression
The SUBSTR function returns a portion of
SAS Statements
Results
----+----1----+----2
an expression that you specify in
date='06MAY98';
argument. The portion begins with the
month=substr(date,3,3);
character specified by position and is the
year=substr(date,6,2);
number of characters specified by n.
put @1 month @5 year;
MAY 98
A variable that is created by SUBSTR
obtains its length from the length of
argument.
Interaction:
SUBSTR (left of =)
Syntax
Arguments
Argument
position
n
Restriction:
Tip:
Tip:
Replaces character value contents
SUBSTR(argument,position<,n>)=characters-to-replace
specifies a character variable.
specifies a numeric expression that is the beginning character position.
specifies a numeric expression that is the length of the substring that will be replaced.
n can not be larger than the length of the expression that remains in argument after position..
If you omit n SAS uses all of the characters on the right side of the assignment statement to
specifies a character expression
replace the values of argumentcharacters-to-replace
that will replace the contents of argument.
Enclose a literal string of characters in quotation marks
When you use the SUBSTR function on
the left side of an assignment statement,
SAS places the value of argument with the
expression on right side. SUBSTR replaces
n characters starting at the character you
specify in position.
COMPBL
SAS Statements
a='KIDNAP';
substr(a,1,3)='CAT';
put a;
b=a;
substr(b,4)='TY';
put b;
Results
CATNAP
CATTY
Removes multiple blanks from a character string
The COMPBL function removes multiple
blanks in a character string by translating
each occurrence of two or more consecutive
blanks into a single blank.
The value that the COMPBL function
returns has a default length of 200. You can
use the LENGTH statement, before calling
COMPBL, to set the length of the value.
SAS Statements
Results
----+----1----+----2-
string='Hey
Diddle Diddle';
string=compbl(string);
put string;
126
Hey Diddle Diddle
TRIM
Syntax
TRIM(argument)
argument specifies any SAS character
expression.
Details
TRIM copies a character argument,
removes all trailing blanks, and returns the
trimmed argument as a result. If the
argument is blank, TRIM returns one
blank. TRIM is useful for concatenating
because concatenation does not remove
trailing blanks.
Assigning the results of TRIM to a
variable does not affect the length of the
receiving variable. If the trimmed value is
shorter than the length of the receiving
variable, SAS pads the value with new
blanks as it assigns it to the variable.
Comparisons
The TRIM and TRIMN functions are
similar. TRIM returns one blank for a
blank string. TRIMN returns a null string
(zero blanks) for a blank string
Example 1: Removing Trailing Blanks
data test;
input part1 $ 1-10 part2 $ 11-20;
hasblank=part1||part2;
noblank=trim(part1)||part2;
put hasblank;
put noblank;
datalines;
Data Line
apple
Results
sauce
----+----1----+----2
apple
sauce
applesauce
Example 2: Concatenating a Blank Character
Expression
SAS Statements
Results
X="A"||trim(" ")||"B"; put x;
x="
"; y=">"||trim(x)||"<"; put y;
A B
> <
UPCASE Converts all letters in an argument to uppercase
Syntax
UPCASE(argument)
argument specifies any SAS character
expression.
SAS Statements
Results
name=upcase('John B.
Smith');
put name;
JOHN B. SMITH
Details
The UPCASE function copies a character
argument, converts all lowercase letters to
uppercase letters, and returns the altered
value as a result.
SCAN Selects a given word from a character expression
delimiters specifies a character expression that produces
Syntax
SCAN(argument,n<, delimiters>)
characters that you want SCAN to use as word separators
in the character string.
Default: If you omit delimiters in an ASCII
Arguments
environment, SAS uses the following characters:
blank . < ( + & ! $ * ) ; ^ - / , % |
argument specifies any character expression.
Tip: If you represent delimiters as a constant,
n
specifies a numeric expression that
enclose delimiters in quotation marks.
produces the number of the word in
the character string you want SCAN
SAS Statements
Results
to select
arg='ABC.DEF(X=Y)';
Tip: If n is negative, SCAN selects the word in
word=scan(arg,3);
the character string starting from the end of put word;
X=Y
the string. If |n| is greater than the number of
words in the character string, SCAN returns
a blank value.
127
ALTRE PRINCIPALI FUNZIONI PER LE VARIABILI CARATTERE
COMPRESS
Removes specific characters from a character string
DEQUOTE
Removes quotation marks from a character value
INDEX
Searches a character expression for a string of characters
INDEXC
Searches a character expression for specific characters
INDEXW
Searches a character expression for a specified string as a word
LEFT
Left aligns a SAS character expression
LENGTH
Returns the length of an argument
LOWCASE
Converts all letters in an argument to lowercase
MISSING
Returns a numeric result that indicates whether the argument
contains a missing value
QUOTE
Adds double quotation marks to a character value
REPEAT
Repeats a character expression
REVERSE
Reverses a character expression
RIGHT
Right aligns a character expression
SPEDIS
Determines the likelihood of two words matching, expressed as the
asymmetric spelling distance between the two words
TRANSLATE
Replaces specific characters in a character expression
TRANWRD
Replaces or removes all occurrences of a word in a character string
TRIMN
Removes trailing blanks from character expressions and returns a
null string (zero blanks) if the expression is missing
128
M. APPROFONDIMENTI: LE MACRO SAS
M1. Introduzione alla programmazione con macro
Problema da affrontare
1. Si devono leggere 80 file di dati relativi a rilevazioni in 80 diverse stazioni del Mare di Ross
(Antartide). I file sono in formato testo e il loro nome è:
m001.asc ... m009.asc m010.asc ... m080.asc
In ciascun file i dati sono separati da spazi bianchi, una osservazione per riga; la prima
osservazione è sulla seconda. Le variabili sono nell’ordine:
- profondità
- temperatura
- salinità
- 4 variabili non di interesse
- fluorescenza
- 2 variabili non di interesse
- densità
- 1 variabile non di interesse
In ciascun DSS va aggiunta una variabile indicante il numero della stazione di rilevazione (il
numero indicato nel nome del file)
PROGRAMMA SAS (per il primo file)
data a.ant1;
infile 'C:\ANTARTIDE\m001.asc' firstobs=2;
input profond temper salinita x1 x2 x3 x4 fluoresc x5 x6 densita x7;
stazione=1;
drop x1-x7;
run;
Il programma va ripetuto per tutti gli 80 file modificando ciascuna volta il numero del file di input,
quello del file di output e il numero della stazione.
2.
I DSS così costruiti vanno concatenati uno di seguito all’altro con un passo di data del tipo:
data a.tutti;
set a.ANT1 a.ANT2 a.ANT3 a.ANT4
fino al DSS
a.ANT80;
run;
In situazioni come questa la programmazione tramite macro permette di risolvere in modo rapido il
problema.
Una macro è una parte di programma SAS con un suo linguaggio e una sua sintassi particolare.
Le macro sono estremamente utili in situazioni – come la precedente – in cui è necessario
produrre strighe di testo da concatenare ad altre strighe.
Per il problema precedente (punto 2) si deve scrivere
A.ANT1 A.ANT2 A.ANT3 A.ANT4
… fino a
A.ANT80
in cui la prima parte della stringa A.ANT è uguale e la seconda parte è un numero che varia da 1 a 80.
Le istruzioni del linguaggio delle macro incominciano con il simbolo %
Una macro inizia con l’istruzione
e termina con l’istruzione
%macro <nome>;
%mend <nome>;
Una macro che produce le stringhe
A.ANT1 A.ANT2 A.ANT3 A.ANT4
è la seguente:
%macro concatena;
%do i=1 %to 80;
129
…
A.ANT80
a.ant&i
%end;
%mend concatena;
Osserviamo che il parametro di macro i, quando viene utilizzato, è indicato con &i, cioè è
preceduto dal simbolo &, mentre quando viene dichiarato (nel ciclo do) è scritto senza
prefisso.
La macro viene richiamata nel programma nel seguente modo:
data a.tutti;
set %concatena
;
run;
E’ possibile costruire una macro più generale che riceva dall’esterno, quando viene
chiamata, i due numeri di inizio e fine del nome dei DSS (nel caso precedente 1 e 80).
In tal caso il nome dei parametri di macro devono essere scritti tra parentesi dopo il nome:
%macro concatena(in, fin);
%do i=&in %to &fin;
a.ant&i
%end;
%mend concatena;
Abbiamo già visto che i parametri di macro (in questo caso in e fin) quando vengono
utilizzati devono essere preceduti dal simbolo &
data a.tutti;
set
%concatena(1,80)
/* oppure %concatena(4,15) .... */
;
run;
Generalizziamo ulteriormente la macro in modo che possa ricevere dall’esterno, quando
viene chiamata, la prima parte della striga del nome del DSS da costruire (nell’esempio
precedente A.ANT)
%macro concatena(in, fin, nomeds);
%do i=&in %to &fin;
&nomeds.&i
%end;
%mend concatena;
Per costruire una stringa formata dalla concatenzione di due parametri di macro (&nomeds e &i),
bisogna che essi siano separati da . (punto)
&nomeds.&i
data a.tutti;
set %concatena(1,80,a.ant)
;
run;
Infine inseriamo nella macro anche la parte di programma relativa alla costruzione del DSS di
output, inserendo anche il nome di quest’ultimo fra i parametri di macro.
%macro concatena(nomeds,in,fin,dsout);
data &dsout;
set
%do i=&in %to &fin;
&nomeds.&i
%end;
;
run;
%mend concatena;
La sua chiamata è:
%concatena(a.ant,4,8,a.finale);
130
Costruiamo ora una macro che semplifichi la lettura dei dati dai file di testo.
Inizialmente prendiamo in esemi i file numerati da 1 a 9.
%macro lettura;
%do n=1 %to 9;
data a.ant&n;
infile "C:\ANTARTIDE\m00&n..asc"
firstobs=2;
stazione=&n;
drop x1-x7;
run;
%end;
%mend lettura;
Osserviamo che:
1. dovendo scrivere un parametro di macro all’interno di una stringa delimitata dal
simbolo ' (apice) – vedi programma iniziale – bisogna rimpiazzare tale simbolo con il
simbolo " (virgolette o doppio apice)
2. i due punti che compaiono in
m00&n..asc
si riferiscono il primo alla separazione del parametro di macro dalla stringa successiva
e il secondo fa parte del nome del file
Anche in questo caso nella macro possono essere messi due parametri che indicano i numeri di
inizio e di fine dei nomi dei file di dati; può essere scritta anche una macro analoga per leggere i file da
m010 a m080 (cambia il numero di 0 nel nome)
%macro lettura(in,fin);
%do n=&in %to &fin;
data ant&n;
firstobs=2;
input profond temper salinita
x1 x2 x3 x4 fluoresc x5 x6
densita x7;
stazione=&n;
drop x1-x7;
run;
%end;
%mend lettura;
%macro lettura2(in,fin);
%do n=&in %to &fin;
data ant&n;
firstobs=2;
input profond temper salinita
x1 x2 x3 x4 fluoresc x5 x6
densita x7;
stazione=&n;
drop x1-x7;
run;
%end;
%mend lettura;
%lettura(1,9);
%lettura2(10,80)
Le due macro possono essere condensate in una sola utilizzando l’istruzione %if... %then...
%else...; del linguaggio delle macro.
%macro lettura(in,fin);
%do n=&in %to &fin;
data ant&n;
%if &n < 10 %then
infile "C:\ANTARTIDE\m00&n..asc" firstobs=2;
%else
infile "C:\ANTARTIDE\m0&n..asc" firstobs=2;
;
stazione=&n;
drop x1-x7;
run;
%end;
%mend lettura;
Attenzione al ; dopo l’istruzione %if...%then...%else...;
La chiamata della macro è: %lettura(1,80);
131
M2. SAS Macro Language: Reference
The SAS macro language consists of statements, functions, and automatic macro variables.
Macro Statements
A macro language statement instructs the macro processor to perform an operation. It consists of a string
of keywords, SAS names, and special characters and operators, and it ends in a semicolon. Some macro
language statements are allowed only in macro definitions, but you can use others anywhere in a SAS
session or job, either inside or outside macro definitions (referred to as open code). Macro Language
Statements Allowed in Macro Definitions and Open Code lists macro language statements that you can
use in both macro definitions and open code.
Macro Language Statements Allowed in Macro Definitions and Open Code
Statement Description
%* comment designates comment text
%DISPLAY displays a macro window
%GLOBAL creates macro variables that are available during the execution of an entire SAS session
%INPUT
supplies values to macro variables during macro execution
%KEYDEF assigns a definition to or identifes the definition of a function key
%LET
creates a macro variable and assigns it a value
%MACRO begins a macro definition
%PUT
writes text or the values of macro variables to the SAS log
%SYSCALL invokes a SAS call routine
%SYSEXEC issues operating system commands
%SYSLPUT defines a new macro variable or modifies the value of an existing macro variable on a
remote host or server
%SYSRPUT assigns the value of a macro variable on a remote host to a macro variable on the local host
%WINDOW defines customized windows
Macro Language Statements Allowed in Macro Definitions Only
Statement
Description
%DO
begins a %DO group
%DO, Iterative
executes statements repetitively, based on the value of an index variable
%DO %UNTIL
executes statements repetively unti la condition is true
%DO %WHILE
executes statements repetitively while a condition is true
%END
ends a %DO group
%GOTO
branches macro processing to the specified label
%IF-%THEN/%ELSE conditionally processes a portion of a macro
%label:
identifies the destination of a %GOTO statement
%LOCAL
creates macro variable that are available only during the execution of the macro
where they are defined
%MEND
ends a macro definition
Statements That Perform Automatic Evaluation
Some macro statements perform an operation based on an evaluation of an arithmetic or logical
expression. They perform the evaluation by automatically calling the %EVAL function. If you get an
error message about a problem with %EVAL when a macro does not use %EVAL explicitly, check for
one of these statements. The macro statements that perform automatic evaluation are:
%DO macro-variable=expression %TO expression <%BY expression>;
132
%DO %UNTIL(expression);
%DO %WHILE(expression);
%IF expression %THEN action;
Macro Functions
In general, a macro language function processes one or more arguments and produces a result. You can
use all macro functions in both macro definitions and open code. Macro functions include character
functions, evaluation functions, and quoting functions.
Macro Functions
Function
%BQUOTE,
%NRBQUOTE
%EVAL
%INDEX
%LENGTH
%QUOTE,
%NRQUOTE
Description
mask special characters and mnemonic operators in a resolved value at macro
execution.
evaluates arithmetic and logical expressions using integer arithmetic.
returns the position of the first character of a string.
returns the length of a string.
mask special characters and mnemonic operators in a resolved value at macro
executin. Unmatched quotation marks ('") and parentheses ( () ) must be marked
with a preceding %.
%SCAN, %QSCAN search for a warod specified by its number. %QSCAN masks special characters and
mnemonic operators in its result.
%STR, %NRSTR
mask special characters and mnemonic operators in constant text at macro
compilation. Unmatched quotation marks ('") and parentheses ( () ) must be marked
with a preceding %.
%SUBSTR,
produce a substring of a characater string. %QSUBSTR masks special characters
%QSUBSTR
and mnemonic operators in its result.
%SUPERQ
masks all special characters and mnemonic operators at macro execution but
prevents resolution of the value.
%SYSEVALF
evaluates arithmetic and logical expressions using floating point arithmetic.
%SYSFUNC,
execute SAS functions or user-written functions. %QSYSFUNC masks special
%QSYSFUNC
charactaers and mnemonic operators in its result.
%SYSGET
returns the value of a specified host environment variable.
%SYSPROD
reports whether a SAS software product is licensed at the site.
%UNQUOTE
unmasks all special characters and mnemonic operators for a value.
%UPCASE,
convert characters to uppercase. %QUPCASE masks special characters and
%QUPCASE
Character Functions
Character functions change character strings or provide information about them.
Function
%INDEX
%LENGTH
%SCAN, %QSCAN
%SUBSTR,
%QSUBSTR
%UPCASE,
%QUPCASE
Macro Character Functions
Description
returns the position of the first character of a string.
returns the length of a string
search for a word that is specified by a number. %QSCAN masks special
characters and mnemonic operataors in its result.
produce a substring of a character string. %QSUBSTR masks special characters
and mnemonic operators in its result.
convert characters to uppercase. %QUPCASE masks special charactaers and
For macro character functions that have a Q form (for example, %SCAN and %QSCAN), the two
functions work alike except that the function beginning with Q masks special characters and mnemonic
133
operators in its result. In general, use the function beginning with Q when an argument has been
previously masked with a macro quoting function or when you want the result to be masked (for example,
when the result may contain an unmatched quotation mark or parenthesis).
Many macro character functions have names corresponding to SAS character functions and perform
similar tasks (such as %SUBSTR and SUBSTR). But, macro functions operate before the DATA step
executes. Consider this DATA step:
data out.%substr(&sysday,1,3);
/* macro function */
set in.weekly (keep=name code sales);
length location $4;
location=substr(code,1,4);
/* SAS function */
run;
Running the program on Monday creates the data set name OUT.MON, as shown:
data out.MON;
/* macro function */
set in.weekly (keep=name code sales);
length location $4;
location=substr(code,1,4);
/* SAS function */
run;
Suppose that the IN.WEEKLY variable CODE contains the values cary18593 and apex19624. The SAS
function SUBSTR operates during DATA step execution and assigns these values to the variable
LOCATION, cary and apex.
Evaluation Functions
Evaluation functions evaluate arithmetic and logical expressions. They temporarily convert the operands
in the argument to numeric values. Then, they perform the operation specified by the operand and convert
the result to a character value. The macro processor uses evaluation functions to:
•
make character comparisons
•
evaluate logical (Boolean) expressions
•
assign numeric properties to a token, such as an integer in the argument of a function.
Macro Evaluation Functions
Function
Description
%EVAL
evaluates arithmetic and logical expressions using integer arithmetic
%SYSEVALF evaluates arithmetic and logical expressions using floating point
arithmetic
%EVAL is called automatically by the macro processor to evaluate expressions in the arguments to the
statements that perform evaluation, listed on Statements That Perform Automatic Evaluation, and in the
following functions:
%QSCAN(argument,n<,delimiters>)
%QSUBSTR(argument,position<,length>)
%SCAN(argument,n<,delimiters>)
%SUBSTR(argument,position<,length>)
Quoting Functions
Macro quoting functions mask special characters and mnemonic operators so the macro processor
interprets them as text instead of elements of the macro language.
…………..
134
Other Functions
Three other macro functions do not fit into the earlier categories, but they provide important information.
Macro Quoting Functions lists these functions:
Function
%SYSFUNC,
%QSYSFUNC
%SYSGET
%SYSPROD
Macro Quoting Functions
Description
execute SAS language functions or user-written functions within the macro
facility.
returns the value of the specified host environment variable. For details, see the
SAS Companion for your operating system.
reports whether a SAS software product is licensed at the site.
…………….
Macro Variables: Introduction
Macro variables are tools that enable you to dynamically modify the text in a SAS program through
symbolic substitution. You can assign large or small amounts of text to macro variables, and after that,
you can use that text by simply referencing the variable that contains it.
Macro variable values have a maximum length of 32K characters. The length of a macro variable is
determined by the text assigned to it instead of an explicit length declaration. So its length varies with
each value it contains. Macro variables contain only character data. However, the macro facility has
features that allow a variable to be evaluated as a number when it contains a value that can be interpreted
as a number. The value of a macro variable remains constant until it is explicitly changed. Macro
variables are independent of SAS data set variables.
Macro variables defined by macro programmers are called user-defined macro variables. Those defined
by the SAS System are called automatic macro variables. You can define and use macro variables
anywhere in SAS programs, except within data lines.
When a macro variable is defined, the macro processor adds it to one of the program's macro variable
symbol tables. When a macro variable is defined in a statement that is outside a macro definition (called
open code) or when the variable is created automatically by the SAS System (except SYSPBUFF), the
variable is held in the global symbol table, which SAS creates at the beginning of a SAS session. When a
macro variable is defined within a macro and is not explicitly defined as global, the variable is typically
held in the macro's local symbol table, which SAS creates when the macro starts executing.
When it is in the global symbol table, a macro variable exists for the remainder of the current SAS
session. A variable in the global symbol table is called a global macro variable. It has global scope
because its value is available to any part of the SAS session.
When it is in a local symbol table, a macro variable exists only during execution of the macro in which it
is defined. A variable in a local symbol table is called a local macro variable. It has local scope because
its value is available only until the macro stops executing. Chapter 2 contains figures that illustrate a
program with a global and a local symbol table.
Using Macro Variables
After a macro variable is created, you typically use the variable by referencing it with an ampersand
preceding its name (&variable-name), which is called a macro variable reference. These references
perform symbolic substitutions when they resolve to their value. You can use these references anywhere
in a SAS program. To resolve a macro variable reference that occurs within a literal string, enclose the
string in double quotation marks. Macro variable references that are enclosed in single quotation marks
are not resolved. Compare the following statements that assign a value to macro variable DSN and use it
in a TITLE statement:
%let dsn=Newdata;
135
title1 "Contents of Data Set &dsn";
title2 'Contents of Data Set &dsn';
In the first TITLE statement, the macro processor resolves the reference by replacing &DSN with the
value of macro variable DSN. In the second TITLE statement, the value for DSN does not replace
&DSN. The SAS System sees the following statements:
TITLE1 "Contents of Data Set Newdata";
TITLE2 'Contents of Data Set &dsn';
You can refer to a macro variable as many times as you need to in a SAS program. The value remains
constant until you change it. For example, this program refers to macro variable DSN twice:
%let dsn=Newdata;
data temp;
set &dsn;
if age>=20;
run;
proc print;
title "Subset of Data Set &dsn";
run;
Each time the reference &DSN appears, the macro processor replaces it with Newdata. Thus, the SAS
System sees these statements:
DATA TEMP;
SET NEWDATA;
IF AGE>=20;
RUN;
PROC PRINT;
TITLE "Subset of Data Set NewData";
RUN;
Note: If you reference a macro variable that does not exist, a warning message is printed in the SAS log.
For example, if macro variable JERRY is misspelled as JERY, the following produces an unexpected
result:
%let jerry=student;
data temp;
x="produced by &jery";
run;
This produces the following message:
WARNING:
Apparent symbolic reference JERY not resolved.
Combining Macro Variable References with Text
It is often useful to place a macro variable reference next to leading or trailing text (for example,
DATA=PERSNL&YR.EMPLOYES, where &YR contains two characters for a year), or to reference
adjacent variables (for example, &MONTH&YR). This allows you to reuse the same text in several
places or to reuse a program because you can change values for each use.
To reuse the same text in several places, you can write a program with macro variable references
representing the common elements. You can change all the locations with a single %LET statement, as
shown:
%let name=sales;
data new&name;
set save.&name;
more SAS statements
if units>100;
run;
After macro variable resolution, the SAS System sees these statements:
DATA NEWSALES;
SET SAVE.SALES;
more SAS statements
IF UNITS>100;
RUN;
136
Notice that macro variable references do not require the concatenation operator as the DATA step does.
The SAS System forms the resulting words automatically.
Delimiting Macro Variable Names within Text
Sometimes when you use a macro variable reference as a prefix, the reference does not resolve as you
expect if you simply concatenate it. Instead, you may need to delimit the reference by adding a period to
the end of it.
A period immediately following a macro variable reference acts as a delimiter; that is, a period at the end
of a reference forces the macro processor to recognize the end of the reference. The period does not
appear in the resulting text.
Continuing with the example above, suppose that you need another DATA step that uses the names
SALES1, SALES2, and INSALES.TEMP. You might add the following step to the program:
/* first attempt to add suffixes--incorrect
data &name1 &name2;
set in&name.temp;
run;
*/
After macro variable resolution, the SAS System sees these statements:
DATA &NAME1 &NAME2;
SET INSALESTEMP;
RUN;
None of the macro variable references have resolved as you intended. The macro processor issues
warning messages, and the SAS System issues syntax error messages. Why?
Because NAME1 and NAME2 are valid SAS names, the macro processor searches for those macro
variables rather than for NAME, and the references pass into the DATA statement without resolution.
In a macro variable reference, the word scanner recognizes that a macro variable name has ended when it
encounters a character that is not allowed in a SAS name. However, you can use a period ( . ) as a
delimiter for a macro variable reference. For example, to cause the macro processor to recognize the end
of the word NAME in this example, use a period as a delimiter between &NAME and the suffix:
/* correct version */
data &name.1 &name.2;
The SAS System now sees this statement:
DATA SALES1 SALES2;
Creating a Period to Follow Resolved Text
Sometimes you need a period to follow the text resolved by the macro processor. For example, a twolevel data set name needs to include a period between the libref and data set name.
When the character following a macro variable reference is a period, use two periods. The first is the
delimiter for the macro reference, and the second is part of the text. For example,
set in&name..temp;
After macro variable resolution, the SAS System sees this statement:
SET INSALES.TEMP;
You can end any macro variable reference with a delimiter, but the delimiter is necessary only if the
characters that follow can be part of a SAS name. For example, both of these TITLE statements are
correct:
title "&name.--a report";
title "&name--a report";
They produce:
TITLE "sales--a report";
137
Forcing a Macro Variable to Be Local
At times you need to ensure that the macro processor creates a local macro variable rather than changing
the value of an existing macro variable. In this case, use the %LOCAL statement to create the macro
variable.
Explicitly make all macro variables created within macros local when you do not need their values after
the macro stops executing. Debugging the large macro programs is easier if you minimize the possibility
of inadvertently changing a macro variable's value. Also, local macro variables do not exist after their
defining macro finishes executing, while global variables exist for the duration of the SAS session;
therefore, local variables use less overall storage.
Suppose you want to use the macro NAMELST to create a list of names for a VAR statement, as shown
here:
%macro namelst(name,number);
%do n=1 %to &number;
&name&n
%end;
%mend namelst;
You invoke NAMELST in this program:
%let n=North State Industries;
proc print;
var %namelst(dept,5);
title "Quarterly Report for &n";
run;
After macro execution, the SAS compiler sees the following statements:
proc print;
var dept1 dept2 dept3 dept4 dept5;
title "Quarterly Report for 6";
run;
The macro processor changes the value of the global variable N each time it executes the iterative %DO
loop. (After the loop stops executing, the value of N is 6, as described in " %DO" in Chapter 13, "Macro
Language Dictionary.") To prevent conflicts, use a %LOCAL statement to create a local variable N, as
shown here:
%macro namels2(name,number);
%local n;
%do n=1 %to &number;
&name&n
%end;
%mend namels2;
Now execute the same program:
%let n=North State Industries;
proc print;
var %namels2(dept,5);
title "Quarterly Report for &n";
run;
The macro processor generates the following statements:
proc print;
var dept1 dept2 dept3 dept4 dept5;
title "Quarterly Report for North State Industries";
run;
Global and Local Variables with the Same Name shows the symbol tables before NAMELS2 executes,
while NAMELS2 is executing, and when the macro processor encounters the reference &N in the TITLE
statement.
138
Creating Global Macro Variables
The %GLOBAL statement creates a global macro variable if a variable with the same name does not
already exist there, regardless of what scope is current.
For example, in the macro NAME4, the macro CONDITN contains a %GLOBAL statement that creates
the macro variable COND as a global variable:
%macro conditn;
%global cond;
%let old=sales;
%let cond=cases>0;
%mend conditn;
Here is the rest of the program:
%let new=inventry;
%macro name4;
%let new=report;
%let old=warehse;
%conditn
data &new;
set &old;
if &cond;
run;
%mend name4;
%name4
Invoking NAME4 generates these statements:
data report;
set sales;
if cases>0;
run;
Suppose you want to put the SAS DATA step statements outside NAME4. In this case, all the macro
variables must be global for the macro processor to resolve the references. You cannot add OLD to the
%GLOBAL statement in CONDITN because the %LET statement in NAME4 has already created OLD
as a local variable to NAME4 by the time CONDITN begins to execute. (You cannot use the %GLOBAL
statement to make an existing local variable global.)
Thus, to make OLD global, use the %GLOBAL statement before the variable reference appears anywhere
else, as shown here in the macro NAME5:
%let new=inventry;
%macro conditn;
%global cond;
%let old=sales;
%let cond=cases>0;
%mend conditn;
%macro name5;
%global old;
%let new=report;
%let old=warehse;
%conditn
%mend name5;
%name5
data &new;
set &old;
if &cond;
run;
Now the %LET statement in NAME5 changes the value of the existing global variable OLD rather than
creating OLD as a local variable. The SAS compiler sees the following statements:
data report;
set sales;
if cases>0;
run;
139
N. COME OPERARE SU MATRICI IN SAS
N1. IL MODULO SAS/IML
Il calcolo matriciale in SAS può essere effettuato tramite una procedura a cui si accede con il
comando:
proc iml;
e da cui si esce con il comando:
quit;
Le istruzioni possono essere eseguite in modo interattivo o essere inserite in programmi.
La procedura è provvista di un linguaggio di programmazione che prevede il trattamento di
espressioni aritmetiche e carattere, input e output di dati, controlli sull'esecuzione (if, do, goto,...).
Gli elementi di dati fondamentali sono le matrici.
Le espressioni usano operatori che si applicano alle intere matrici.
La procedura incorpora un vocabolario molto ampio di operatori, funzioni e routine.
Non è necessario dichiarare dimensioni, spazi, attributi, ...
Ha un solo difetto! Non è immediato il passaggio dalle strutture di dati proprie del linguaggio
SAS (Data Set SAS) alle corrispondenti strutture matriciali e viceversa.
N2. PER ASSEGNARE DIRETTAMENTE I VALORI A UNA MATRICE (O A UN VETTORE)
Esempi:
⎛ 1 2⎞
Per costruire una matrice X di 3 righe e due colonne, X = ⎜⎜ 3 4⎟ ,
⎟
⎝ 5 6⎠
si usa l'istruzione:
x = {1 2, 3 4, 5 6}
I valori sono scritti dentro parentesi graffe {.}, si assegnano per riga, separati da spazi bianchi,
le righe sono separate da virgola.
Analogamente, per costruire un vettore v di 3 elementi, v = (1 2 3), si usa l'istruzione:
v = {1 2 3}
I valori sono scritti dentro parentesi graffe, separati da spazi bianchi. Un vettore così definito è un
vettore riga.
E' possibile usare istruzioni per ripetere alcuni valori.
N3. DA DATA SET SAS A MATRICE E VICEVERSA
Il modo più usuale per assegnare valori a una matrice è quello di utilizzare i valori contenuti in un
Data Set SAS.
COME COSTRUIRE UNA MATRICE DA UN DATA SET
Per costruire una matrice di nome A con colonne uguali alle variabili e con tutte le osservazioni di
un Data Set SAS di nome PIPPO si usano le istruzioni:
USE pippo;
READ ALL INTO a;
L'istruzione USE apre il Data Set (se è omessa viene considerato il Data Set corrente) e l'istruzione
READ legge dal Data Set e costruisce la matrice.
E' possibile selezionare variabili e/o osservazioni del Data Set.
La sintassi (non esaustiva) delle due istruzioni USE e READ è la seguente:
USE DataSet
<operando VAR>
<WHERE (espressione 
L'operando VAR permette di selezionare alcune variabili del DATA set; con WHERE si selezionano
osservazioni.
140
Esempio:
USE pippo VAR {nome
indirizzo} WHERE (prov = 'GE');
vengono considerate solo le variabili "nome" e "indirizzo" degli abitanti della provincia di Genova.
READ
<range>
<operando VAR> <operando POINT> <WHERE (espressione)>
INTO nome matrice;
L'uso dell'operando VAR e di WHERE è analogo al precedente (le variabili devono essere tutte
numeriche o tutte carattere; numeriche è il default).
Il range serve per specificare ulteriormente le osservazioni da considerare.
<range> =
tutte le osservazioni
ALL
CURRENT
l'oss. corrente (default)
NEXT
l'oss. successiva alla corrente
AFTER
tutte le successive alla corrente
<operando POINT>
Esempi di operando POINT:
POINT 10
l'osservazione 10
POINT {10 25}
le oss. 10 e 25
POINT (20 : 25)
le oss. dalla 20 alla 25
POINT ((20 : 25) | | (30 : 35))
le oss. dalla 20 alla 25 e dalla 30 alla 35
Esempi:
READ ALL VAR {X Y} INTO MAT; tutte le osservazioni delle variabili X e Y
READ POINT 23 INTO MAT;
tutte le variabili numeriche dell'osservazione 23
COME COSTRUIRE UN DATA SET DA UNA MATRICE
Esempio:
Per costruire un Data Set SAS di nome PLUTO da una matrice di nome A con variabili uguali alle
colonne di A e con osservazioni le righe di A si usano le istruzioni:
CREATE pluto FROM a;
APPEND FROM a;
L'istruzione CREATE crea il Data Set; in questa forma semplice le variabili hanno nome COL1,
COL2, COLn.
L'istruzione APPEND aggiunge dati alla fine del Data Set.
N4. CALCOLI CON MATRICI (aspetti principali)
-
Nel seguito indicheremo con:
A, B, C, ...
le matrici
v
i vettori
s
gli scalari
OPERAZIONI ARITMETICHE
Operazioni elemento per elemento
M = - A
cambia il segno
M = A + B
somma di matrici
M = A + s
somma a ogni elemento di A il valore s
M = A - B
sottrae
M = A - s
sottrae a ogni elemento di A il valore s
M = A / B
divide
M = A / s
divide ogni elemento di A per il valore s
141
M = A # B
moltiplica (prodotto di Schur o Hadamard)
M = A # v
moltiplica ogni elemento sulle righe di A per i corrispondenti elementi di v
v deve avere un n° di elementi pari alle righe di A
M = A # s
moltiplica ogni elemento di A per il valore s
M = A ## B
gli elelemti di A sono elevati al corrispondente elemento di B (se un valore
di A è < 0 il corrsipondente elemelemto di B deve essere intero)
M = A <> B
prende il massimo
M = A >< B
prende il minimo
Operazioni matriciali
M = A * B
M = A ** s
M = A ** (-1)
M = A @ B
s = TRACE(A)
prodotto righe per colonne
equivale a M = A ∗ A ∗ ... ∗ A (s volte) (A deve essere quadrata)
equivale a M = INV(A) matrice inversa
prodotto di Kronecker o prodotto diretto
se: A mat. n x m, B mat. h x k allora M mat. nh x mk
somma gli elementi diagonali
OPERAZIONI DI CONFRONTO (agiscono elemento per elemento)
M = A < B
M = A <= B
M è una matrice di 0 e 1
M = A > B
M = A >= B
M = A = B
M = A ^= B
OPERAZIONI LOGICHE (agiscono elemento per elemento)
M = A & B
un elemento di M è 1 se i corrispondenti elementi di A e B sono entrambi ≠ 0
M = A | B
un elemento di M è 1 se uno dei due corrispondenti elementi di A e B è ≠ 0
M = ^A
un elemento di M è 1 se il corrispondente elemento di A è = 0
FUNZIONI DI INDAGINE
Permettono di controllare se tutti o alcuni elementi di una matrice sono diversi da 0, per trovare il
numero di elementi uguali a 0, il numero di righe e colonne, ...
FUNZIONI DI RIDUZIONE
Permettono di calcolare il massimo, il minimo, la somma e la somma dei quadrati degli elementi di
una matrice.
FUNZIONI E OPERAZIONI DI MANIPOLAZIONE E RICOSTRUZIONE
M = A' oppure
M = T(A) trasposta
M = DIAG(A)
crea una matrice diagonale con gli elementi diagonali di A
M = DIAG(v)
crea una matrice diagonale con gli elementi di v
s = VECDIAG(A)
crea un vettore da una matrice diagonale
M = I(s)
crea la matrice identità s x s
Altre funzioni permettono di inserire righe o colonne, di rimuovere elementi, di creare matrici a
blocchi diagonali, matrici con valori ripetuti, sottomatrici, concatenare matrici orizzontalmente e
verticalmente, ...
142
FUNZIONI E CALL DI ALGEBRA LINEARE
s = DET(M)
M = INV (A)
CALL EIGEN (v,M,A)
v = EIGVAL (A)
M = EIGVEC (A)
CALL SVD (U,q,V,A)
U = ROOT (A)
x = SOLVE (A,B)
determinante
inversa
crea un vettore v con gli autovalori (in ordine decrescente), una
matrice M con colonne i corrispondenti autovettori di una matrice
quadrata A
crea una vettore v con gli autovalori di A
crea una matrice M con colonne gli autovettori di A
decompone la matrice A, di dimensioni m x n, con n≥m:
A = U ∗ diag (q) ∗ V'
dove:
U'U = V'V = VV' = In
U, matrice m x n, autovettori normalizzati di AA'
q, vettore n, valori singolari (rad. quadr. AA' e A'A)
V, matrice n x n, autovettori di A'A
A deve essere simmetrica, definita non negativa
crea una matrice triangolare superiore, U'U=A
A quadrata e non singolare
risolve l'insieme di equazioni lineari A x = B
non usare x=INV(A∗B)
143

NOTE INTRODUTTIVE AL SISTEMA SAS

Transcript

Documenti analoghi

Spazio Blu: attrezzatura subacquea a Torino e online

I lavori più pagati negli Stati Uniti

La poesia di Eliseo Spiga, Maddalena Frau e Santino

Vini personalizzati - Vini e birra con etichette storiche

Dell PowerEdge T310

Dell PowerEdge T710

Anno 2016 Impegno di spesa per acquisto di piccola cancelleria e

2015/DD/08086 Del: 06/11/2015 Esecutivo da: 09/11/2015

Tesi premiate alla Seconda Edizione di SAS University Challenge