Your SlideShare is downloading. ×
Hadoop Data Tagging and Metadata Extension
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hadoop Data Tagging and Metadata Extension

1,371
views

Published on

QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files …

QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData. It understands dozens of file formats such as pdf/xls/doc file formats, image files, audio and video files, etc.


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,371
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop Based SQL and Big Data Analytics Solution
  • 2. Hadoop Data Tagging and Metadata Extension Hadoop Based SQL and Big Data Analytics Solution
  • 3. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution
  • 4. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”.
  • 5. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc.
  • 6. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc. • The file system manages access to both the content of files and the metadata about those files.
  • 7. What is MetaData? Hadoop Based SQL and Big Data Analytics Solution • Metadata is simply “Data about Data”. • In terms of file system, the metadata is the information about files like size of file, time on which the file was created, last modified, type of file, owner of file etc. • The file system manages access to both the content of files and the metadata about those files. • Metadata characterizes data. It is used to provide documentation such that data can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data.
  • 8. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution
  • 9. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested.
  • 10. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database.
  • 11. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database. • Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster.
  • 12. MetaData Extension with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides On-Ingest metadata extraction service where by extended metadata can be extracted from the files on ingest and you don't need to worry about running costly batch jobs later on. This enables the unstructured data on cluster searchable readily as soon as its ingested. • To help make unstructured data searchable on Hadoop Cluster, QueryIO stores the metadata for each file stored on Hadoop in a relational database. • Since all the metadata and tags associated with a file are kept in a relational database, you can leverage the existing infrastructure built around SQL to search the data on the Hadoop cluster. • It understands dozens of file formats such as pdf/xls/doc file formats, image files, audio and video files, etc.
  • 13. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution
  • 14. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information.
  • 15. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file.
  • 16. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories.
  • 17. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories. • Adding data tags to the data based on some condition or unconditionally is called Data Tagging.
  • 18. What are Data Tags? Hadoop Based SQL and Big Data Analytics Solution • Tag is a label attached to someone or something for the purpose of identification or to give other information. • A Data Tag is a tag attached to the data or file to provide extra information about the data or file. • Data tags can be used to categorize the data based on various criteria to manage vast amount of data. Finally the data can be extracted, sorted and processed based on these categories. • Adding data tags to the data based on some condition or unconditionally is called Data Tagging.
  • 19. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution
  • 20. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags).
  • 21. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster.
  • 22. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.
  • 23. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time.
  • 24. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time. Data Tagging
  • 25. Data Tagging with QueryIO Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides advanced manual and automated data tagging feature which allows you to define properties for files as they are being written to HDFS. It automatically stores the basic MetaData files stored in HDFS and further extends the MetaData layer by enabling you to define additional MetaData (Data Tags). • Here again the tags defined for all the files on cluster are stored in a relational database. It takes care of keeping the metadata and tags stored in database in synch with the files stored on Hadoop cluster. • Data tagging helps you to define data tag and operator which should be applied on files on cluster. You can choose to define data tags using the table you have already created using Hive DDL or can choose system defined MetaStore tables for different file formats. You can also provide expressions to apply tags conditionally on-ingest or on a scheduled time. Tag Tag Tag Data Tagging
  • 26. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution
  • 27. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder.
  • 28. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button.
  • 29. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 30. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 31. Unconditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • QueryIO provides both conditional and unconditional Tagging. User can choose to tag hand picked files or files in a particular folder. • For that all the used need to do is open the HDFS data browser, choose the files you want to tag, and click on “add tag” button. • Unconditional tagging is useful when you want to tag the files whose HDFS location is already known to you and the tagging is not dependent on other attributes of file like file type, file length etc.
  • 32. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution
  • 33. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100).
  • 34. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.)
  • 35. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 36. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 37. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 38. Conditional Data Tagging Hadoop Based SQL and Big Data Analytics Solution • Conditions can be defined using the values of file attributes (Ex: if Length > 1000) OR by parsing the content of the file (Ex: if NumberOfLines > 100). • Also the tag value can be obtained by parsing the content of file (Ex: NumberOfWords, ifWord(“QueryIO”)Exist, ifPatternExists etc.) • Conditional data tags can be added on chosen file types or on all files present on the HDFS cluster.
  • 39. Download QueryIO Now! http://QueryIO.com/download/big-data-analytics-download.html OR Take a Demo http://demo.QueryIO.com/queryio Hadoop Based SQL and Big Data Analytics Solution “Its Free”