This presentation consists of topics including Dictionary.coumns Hive SAS Teradata, reading external files, reading longer columns lengths and PROC FEDSQL
2. Agenda
• Motivation
• Introduction
• Appending of wrong data
• Issue of longer column name length
• Difference in number of columns issue
• Conclusion
4. Introduction
• Dictionary --- library which contains
metadata information about the session
• dictionary.dictionaries have all information
of metadata tables
5. Introduction Contd..
Member Name Data Set Label
CATALOGS Catalogs and catalog-specific information
CHECK_CONSTRAINTS Check constraints
COLUMNS Columns from every table
CONSTRAINT_COLUMN_USAGE Constraint column usage
CONSTRAINT_TABLE_USAGE Constraint table usage
DATAITEMS Information Map Data Items
proc sql;
select distinct memname,
memlabel
from dictionary.dictionaries;
quit;
6. Introduction contd..
proc sql;
describe table dictionary.columns;
quit;
Create table DICTIONARY.COLUMNS
(
libname char(8) label='Library Name',
memname char(32) label='Member Name',
memtype char(8) label='Member Type',
name char(32) label='Column Name',
type char(4) label='Column Type',
length num label='Column Length',
npos num label='Column Position',
varnum num label='Column Number in Table',
label char(256) label='Column Label',
format char(49) label='Column Format');
8. Reading external file
data WORK.PERSON;
INFILE "&filenm." firstobs=2 LRECL=10240 missover
dsd;
INPUT fname $ lname $ state $ imp ;
run ;
fname lname state imp
James Smith KY 200
Maria Rodriguez NY 300
9. External file in wrong column order
data WORK.PERSON;
INFILE "&filenm." firstobs=2 LRECL=10240 missover dsd;
INPUT fname $ lname $ state $ imp ;
run ;
When data is stored in external file as
fname imp lname state
Thomas 100 Miller NY
fname lname state imp
Thomas 100 Mi .
10. Appending wrong data
proc append base =PROD.PERSON
data =PERSON1;
run;
fname lname state imp
James Smith KY 200
Maria Rodriguez NY 300
Thomas 100 Mi .
11. Fixing column order issue
• 1st step – make a macrovariable of column
order of reference table/Production table
proc sql noprint;
select name into :standard separated by ','
from dictionary.columns
where upper(libname) = 'PROD'
and upper(memname) = 'PERSON';
quit;
13. Fixing column order –2nd step Contd..
proc sql noprint;
select name into :test separated by ','
from dictionary.columns
where upper(libname) = 'WORK'
and upper(memname) = 'PERSON1';
quit;
16. Issue when moving data from Hive to
Teradata
data teradb.shoppingmart;
set hivedb.shoppingmart;
run;
• ERROR: Variable
shoppingmartactivitymodification has been
defined as both character and numeric
17. Longer column name issue
• maximum column name length allowed in
hive (128), when compared to maximum
column name length allowed in SAS (32)
• shoppingmartactivitymodificationdatetime
(character variable)
shoppingmartactivitymodificationcustid
(numeric variable).
• After Truncation
shoppingmartactivitymodification.
18. Help from Dictionary.columns
• name and label of dictionary.columns
store name of DBMS table
• In dictionary.columns, maximum allowed
column name length is 32 and maximum
allowed column label length is 256.
• Column name of DBMS table is never
truncated in label
19. Help from Dictionary.columns contd..
proc sql;
select a.name, a.label
from dictionary.columns a
inner join dictionary.columns b
on a.memname = ' SHOPPINGMART'
and a.libname = 'HIVE_DB'
and a.name =b.name
and a.label ne b.label;
quit;
20. Solution to long column length issue
by PROC FEDSQL
proc fedsql;
create table teradb.SHOPPINGMART as
select * from hive_db.SHOPPINGMART
quit;
22. Failed Inserts
proc sql;
insert into TERADB.EMP
select * from SASDB.EMP;
quit;
• ERROR: Attempt to insert fewer columns
than specified after the INSERT table
name
23. Finding additional Columns
proc sql;
select * from dictionary.columns
where memname = "EMP"
and libname ='TERADB'
and varnum not in
(select varnum from
dictionary.columns
where memname = "EMP"
and upcase(libname) ='SASDB');
quit;
24. Solution to additional column length
issue –step 1
proc sql noprint;
select name into :columns separated by ',' from
dictionary.columns
where memname = "EMP"
and upcase(libname) ='TERADB'
and varnum in
(select varnum from
dictionary.columns
where memname = "EMP"
and libname ='SASDB');
25. Solution to additional column length
issue –final step
proc sql;
insert into TERADB.EMP(&columns)
select * from SASDB.EMP;
quit;
• There are many other ways this problem
can be solved
26. Conclusion
• Dictionary.columns are very helpful in
understanding columns of the tables
available in any library.
• Dictionary.columns will give metadata
information for columns of SAS tables
along with other RDBMS tables.
• Comparing columns between various
heterogeneous sources can be done with
help of Dictionary.columns.
27. Thanks
• Thanks for listening
• I would like to specially thank, Paul Kirk
Lafler for encouraging me to write papers.
• I would like to thank, Lakshmi Nadiya
Chintalapudi, Charannag Devarapalli ,
Sarika Elisetti, Anvesh Reddy Perati,
Srirama Reddy, Suryakiran Pothuraju for
helping me with proof reading and giving
their valuable suggestions.