This document provides an overview of a presentation on big data and data science. It covers:
1. An introduction to key concepts in big data including architecture, Hadoop, sources of data, and definitions.
2. Details on common big data reference architectures from companies like IBM, Oracle, SAP, and open source technologies.
3. A discussion of how data science is disrupting various industries and the characteristics of firms using data science successfully.
4. Descriptions of machine learning techniques like segmentation, forecasting, and the overall reference architecture for machine learning involving data storage, signal extraction, and responding to insights.
Big data is a huge volume of heterogenous data often generated at high speed.Big data cannot be handles with traditional data analytic tools. Hadoop is one of the mostly used big data analytic tool.Map Reduce, hive, hbase are also the tools for analysis in big data.
Data Science Innovations : Democratisation of Data and Data Science suresh sood
Data Science Innovations : Democratisation of Data and Data Science covers the opportunity of citizen data science lying at the convergence of natural language generation and discoveries in data made by the professions, not data scientists.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Data Science is the Sexiest job in 21st century. Big Data Concept is going to rule the 21st century. Here is the presentation to give complete information and overview of data science big data.
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
Research Triangle Analysts October presentation on Big Data by Dahl Winters (formerly of Research Triangle Institute). Dahl takes her viewers on a whirlwind tour of big data tools such as Hadoop and big data algorithms such as MapReduce, clustering, and deep learning. These slides document the many resources available on the internet, as well as guidelines of when and where to use each.
Big data is a huge volume of heterogenous data often generated at high speed.Big data cannot be handles with traditional data analytic tools. Hadoop is one of the mostly used big data analytic tool.Map Reduce, hive, hbase are also the tools for analysis in big data.
Data Science Innovations : Democratisation of Data and Data Science suresh sood
Data Science Innovations : Democratisation of Data and Data Science covers the opportunity of citizen data science lying at the convergence of natural language generation and discoveries in data made by the professions, not data scientists.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Data Science is the Sexiest job in 21st century. Big Data Concept is going to rule the 21st century. Here is the presentation to give complete information and overview of data science big data.
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
Research Triangle Analysts October presentation on Big Data by Dahl Winters (formerly of Research Triangle Institute). Dahl takes her viewers on a whirlwind tour of big data tools such as Hadoop and big data algorithms such as MapReduce, clustering, and deep learning. These slides document the many resources available on the internet, as well as guidelines of when and where to use each.
Content:
Introduction
What is Big Data?
Big Data facts
Three Characteristics of Big Data
Storing Big Data
THE STRUCTURE OF BIG DATA
WHY BIG DATA
HOW IS BIG DATA DIFFERENT?
BIG DATA SOURCES
BIG DATA ANALYTICS
TYPES OF TOOLS USED IN BIG-DATA
Application Of Big Data analytics
HOW BIG DATA IMPACTS ON IT
RISKS OF BIG DATA
BENEFITS OF BIG DATA
Future of big data
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
( ** Hadoop Training: https://www.edureka.co/hadoop ** )
This Edureka tutorial on "Big Data Applications" will explain various how Big Data analytics can be used in various domains. Following are the topics included in this tutorial:
1. Why do we need Big Data Analytics?
2. Big Data Applications in Health Care.
3. Big Data in Real World Clinical Analytics.
4. Big Data Analytics in Education Sector.
5. IBM Case Study in Education Section.
6. Big data applications and use cases in E-Commerce.
7. How Government uses Big Data analytics?
8. How Big data is helpful in E-Government Portal?
9. Big Data in IOT.
10. Smart city concept.
11. Big Data analytics in Media and Entertainment
12. Netflix example in Big data
13. Future Scope of Big data.
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
What is Big Data?
Big Data Laws
Why Big Data?
Industries using Big Data
Current process/SW in SCM
Challenges in SCM industry
How Big data can solve the problems?
Migration to Big data for an SCM industry
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Simplilearn
In this Big Data presentation, we will be discussing the Big data growth over the last few years followed by the various big data applications. We will look into the various sectors where big data is used such as weather forecast, healthcare, media and entertainment, logistics, travel & tourism and finally in the government & law enforcement sector.
We will be discussing how below industries are using Big Data presentation:
1. Weather forecast
2. Media and entertainment
3. Healthcare
4. Logistics
5. Travel n tourism
6. Government and law enforcement
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Looking at what is driving Big Data. Market projections to 2017 plus what is are customer and infrastructure priorities. What drove BD in 2013 and what were barriers. Introduction to Business Analytics, Types, Building Analytics approach and ten steps to build your analytics platform within your company plus key takeaways.
What is big data ? | Big Data ApplicationsShilpaKrishna6
Big data is similar to ‘small data’ but bigger in size. It is a term that describes the large volume of data both structured and unstructured. Big data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques
Data Science Applications | Data Science For Beginners | Data Science Trainin...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka "Data Science Applications" PPT takes you through the various domains in which data science is being deployed today, along with some potential applications of this technology. The world today runs on data and this PPT shows exactly that.
Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs
Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
Charles Cai has more than two decades of experience and track records of global transformational programme deliveries – from vision, evangelism to end-to-end execution in global investment banks, and energy trading companies, where he excels at designing and building innovative, large scale, Big Data systems in high volume low latency trading, global Energy Trading & Risk Management, and advanced temporal and geospatial predictive analytics, as Chief Front Office Technical Architect and Head of Data Science. He’s also a frequent speaker at Google Campus, Big Data Innovation Summit, Cloud World Forum, Data Science London, QCon London and MoD CIO Symposium etc, to promote knowledge and best practice sharing, with audience ranging from developers, data scientists, to CXO level senior executives from both IT and business background. He has in-depth knowledge and experience Scala, Python, C# / F#, C++, Node.js, Java, R, Haskell programming languages in Mobile, Desktop, Hadoop/Spark, Cloud IoT/MCU and BlockChain etc, and TOGAF9, EMC-DS, AWS CNE4 etc. certifications.
What is Big Data? What is Data Science? What are the benefits? How will they evolve in my organisation?
Built around the premise that the investment in big data is far less than the cost of not having it, this presentation made at a tech media industry event, this presentation will unveil and explore the nuances of Big Data and Data Science and their synergy forming Big Data Science. It highlights the benefits of investing in it and defines a path to their evolution within most organisations.
Big data course | big data training | big data classesNaviWalker
In your world of digitization, Data is an essential source. Businesses in various fields use this Data to get important ideas for their growth. Eventually, this creates a sense of urgency to start learning Big Data. By doing so, you can stay productive and solve real world problems.
Big Data helps to derive important business decisions. Furthermore, successful Big Data processing in huge industrial sectors has taught important lessons on various Big Data concepts.
Big Data training with various Big Data Analytics courses will help you master Data Analysis. In the present world, you have ample scope of becoming a Big Data Scientist. And also getting other Big Data job roles.
Data Science is a form of science that focuses on dealing with huge chunks of data by using modern data analysis tools and techniques to discover hidden patterns, meaningful insights, and make critical business decisions.
A Data Science professional has to utilize complicated machine learning algorithms to develop predictive models. There could be multiple sources present in different formats used in data analysis.
The fundamentals and best practices of securing your Hadoop cluster are top of mind today. In this session, we will examine and explain the components, tools, and frameworks used in Hadoop for authentication, authorization, audit, and encryption of data and processes. See how the latest innovations can let you securely connect more data to more users within your organization.
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...APNIC
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets, by Yuji Sekiya.
Presented at the APNIC 40 APOPS 1 session, Tue 8 Sep 2015.
Content:
Introduction
What is Big Data?
Big Data facts
Three Characteristics of Big Data
Storing Big Data
THE STRUCTURE OF BIG DATA
WHY BIG DATA
HOW IS BIG DATA DIFFERENT?
BIG DATA SOURCES
BIG DATA ANALYTICS
TYPES OF TOOLS USED IN BIG-DATA
Application Of Big Data analytics
HOW BIG DATA IMPACTS ON IT
RISKS OF BIG DATA
BENEFITS OF BIG DATA
Future of big data
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
( ** Hadoop Training: https://www.edureka.co/hadoop ** )
This Edureka tutorial on "Big Data Applications" will explain various how Big Data analytics can be used in various domains. Following are the topics included in this tutorial:
1. Why do we need Big Data Analytics?
2. Big Data Applications in Health Care.
3. Big Data in Real World Clinical Analytics.
4. Big Data Analytics in Education Sector.
5. IBM Case Study in Education Section.
6. Big data applications and use cases in E-Commerce.
7. How Government uses Big Data analytics?
8. How Big data is helpful in E-Government Portal?
9. Big Data in IOT.
10. Smart city concept.
11. Big Data analytics in Media and Entertainment
12. Netflix example in Big data
13. Future Scope of Big data.
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
What is Big Data?
Big Data Laws
Why Big Data?
Industries using Big Data
Current process/SW in SCM
Challenges in SCM industry
How Big data can solve the problems?
Migration to Big data for an SCM industry
Big Data Applications | Big Data Application Examples | Big Data Use Cases | ...Simplilearn
In this Big Data presentation, we will be discussing the Big data growth over the last few years followed by the various big data applications. We will look into the various sectors where big data is used such as weather forecast, healthcare, media and entertainment, logistics, travel & tourism and finally in the government & law enforcement sector.
We will be discussing how below industries are using Big Data presentation:
1. Weather forecast
2. Media and entertainment
3. Healthcare
4. Logistics
5. Travel n tourism
6. Government and law enforcement
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
Looking at what is driving Big Data. Market projections to 2017 plus what is are customer and infrastructure priorities. What drove BD in 2013 and what were barriers. Introduction to Business Analytics, Types, Building Analytics approach and ten steps to build your analytics platform within your company plus key takeaways.
What is big data ? | Big Data ApplicationsShilpaKrishna6
Big data is similar to ‘small data’ but bigger in size. It is a term that describes the large volume of data both structured and unstructured. Big data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques
Data Science Applications | Data Science For Beginners | Data Science Trainin...Edureka!
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka "Data Science Applications" PPT takes you through the various domains in which data science is being deployed today, along with some potential applications of this technology. The world today runs on data and this PPT shows exactly that.
Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs
Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
Charles Cai has more than two decades of experience and track records of global transformational programme deliveries – from vision, evangelism to end-to-end execution in global investment banks, and energy trading companies, where he excels at designing and building innovative, large scale, Big Data systems in high volume low latency trading, global Energy Trading & Risk Management, and advanced temporal and geospatial predictive analytics, as Chief Front Office Technical Architect and Head of Data Science. He’s also a frequent speaker at Google Campus, Big Data Innovation Summit, Cloud World Forum, Data Science London, QCon London and MoD CIO Symposium etc, to promote knowledge and best practice sharing, with audience ranging from developers, data scientists, to CXO level senior executives from both IT and business background. He has in-depth knowledge and experience Scala, Python, C# / F#, C++, Node.js, Java, R, Haskell programming languages in Mobile, Desktop, Hadoop/Spark, Cloud IoT/MCU and BlockChain etc, and TOGAF9, EMC-DS, AWS CNE4 etc. certifications.
What is Big Data? What is Data Science? What are the benefits? How will they evolve in my organisation?
Built around the premise that the investment in big data is far less than the cost of not having it, this presentation made at a tech media industry event, this presentation will unveil and explore the nuances of Big Data and Data Science and their synergy forming Big Data Science. It highlights the benefits of investing in it and defines a path to their evolution within most organisations.
Big data course | big data training | big data classesNaviWalker
In your world of digitization, Data is an essential source. Businesses in various fields use this Data to get important ideas for their growth. Eventually, this creates a sense of urgency to start learning Big Data. By doing so, you can stay productive and solve real world problems.
Big Data helps to derive important business decisions. Furthermore, successful Big Data processing in huge industrial sectors has taught important lessons on various Big Data concepts.
Big Data training with various Big Data Analytics courses will help you master Data Analysis. In the present world, you have ample scope of becoming a Big Data Scientist. And also getting other Big Data job roles.
Data Science is a form of science that focuses on dealing with huge chunks of data by using modern data analysis tools and techniques to discover hidden patterns, meaningful insights, and make critical business decisions.
A Data Science professional has to utilize complicated machine learning algorithms to develop predictive models. There could be multiple sources present in different formats used in data analysis.
The fundamentals and best practices of securing your Hadoop cluster are top of mind today. In this session, we will examine and explain the components, tools, and frameworks used in Hadoop for authentication, authorization, audit, and encryption of data and processes. See how the latest innovations can let you securely connect more data to more users within your organization.
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...APNIC
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets, by Yuji Sekiya.
Presented at the APNIC 40 APOPS 1 session, Tue 8 Sep 2015.
Enterprise Approach towards Cost Savings and Enterprise AgilityNUS-ISS
Presented by Mr Poon See Hong, Deputy Director (Planning), Police Logistics Department, Singapore Police Force, at our 14th Architecture Community of Practice Forum on 21 Jul 2016.
Building Hadoop Data Applications with Kite by Tom WhiteThe Hive
With a such a large number of components in the Hadoop ecosystem, writing Hadoop applications can be a big challenge for newcomers. In this talk Tom looks at best practices for building data applications that run on Hadoop, and introduces the Kite SDK, an open source project created at Cloudera with the goal of simplifying Hadoop application development by codifying many of these best practices.
Meet with Tom White:
Tom White is one of the foremost experts on Hadoop. He has been an Apache Hadoop committer since February 2007, and is a Member of the Apache Software Foundation. Tom is a software engineer at Cloudera, where he has worked, since its foundation, on the core distributions from Apache and Cloudera. Previously he was an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O’Reilly, java.net and IBM’s developerWorks, and has spoken at many conferences, including ApacheCon and OSCON. Tom has a B.A. in mathematics from the University of Cambridge and an M.A. in philosophy of science from the University of Leeds, UK. He currently lives in Wales with his family.
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...CA API Management
Chief Architect Francois Lascelles gave this presentation at Gartner Catalyst 2013. The user experience associated with mobile applications is a critical determinant of the adoption of the APIs that powers them. Mobile platforms and their public app stores create challenges when it comes to securing APIs consumed by mobile applications in such a way that does not require constant user prompts. This presentation will describe the challenge of providing positive UX patterns such as single sign-on on mobile platforms and explore API provider-side architectures enabling them.
It introduces and illustrates use cases, benefits and problems for Kerberos deployment on Hadoop; how Token support and TokenPreauth can help solve the problems. It also briefly introduces Haox project, a Java client library for Kerberos.
Real time big data analytical architecture for remote sensing applicationLeMeniz Infotech
Real time big data analytical architecture for remote sensing application
Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Web : http://www.lemenizinfotech.com
Web : http://www.ieeemaster.com
Mail : projects@lemenizinfotech.com
Blog : http://ieeeprojectspondicherry.weebly.com
Blog : http://www.ieeeprojectsinpondicherry.blogspot.in/
Youtube:https://www.youtube.com/watch?v=eesBNUnKvws
Big Data Analytics (BDA) is rapidly turning out to be a significant global enterprise need. It aims to facilitate the storage, querying and analysis of enterprise big data, which is getting more complicated and time-consuming with traditional database technologies. Apache Hadoop is a well-known Open-source BDA enterprise solution which is seeing an annual application growth rate of 60% globally.
With the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that allows organizations to efficiently rein in their big data business transactions. Hadoop is uniquely capable of storing, aggregating, querying and analyzing big data sources into formats that fuel new business insights. Organizations that embrace solution architectures focused on maximizing data-driven insights will put themselves in a position to drive more business, enhance productivity, maintain competitive edge or discover new and lucrative business opportunities. Over the coming years, Hadoop could be in a position to process more than half the world’s data.
To educate organizations about how best to leverage Apache Hadoop as a key component of their enterprise big data architecture, Innovative Management Services is pleased to host the 1st annual Open-BDA Hadoop Summit 2014 which is scheduled to be held on 18th & 19th November, 2014 at Marriott Hotel, Karachi.
Are you excited and want to learn Big Data Technologies? Do you feel that internet is loaded with free materials is complicated for a newbie?
There are many things that may go wrong when learning a new technology. Free internet material are sometimes can of worms for a beginner and training is advised for a jumpstart.
Open-BDA Big Data Hadoop Developer Training which is going to be held on 11th & 12th May 2015 @ Marriott Hotel Karachi, will cover everything you need to know to start a career in Hadoop technology and achieve expertise to a level where you can take certification exams with MAPR, Cloudera & Hortonworks with confidence. You can start as a beginner and this course will help you become a certified professional.
As Hadoop becomes a critical part of Enterprise data infrastructure, securing Hadoop has become critically important. Enterprises want assurance that all their data is protected and that only authorized users have access to the relevant bits of information. In this session we will cover all aspects of Hadoop security including authentication, authorization, audit and data protection. We will also provide demonstration and detailed instructions for implementing comprehensive Hadoop security.
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
Data is exponentially increasing in both types and volumes, creating opportunities for businesses. Watch this video and learn from three Big Data experts: John Kreisa, VP Strategic Marketing at Hortonworks, Imad Birouty, Director of Technical Product Marketing at Teradata and John Haddad, Senior Director of Product Marketing at Informatica.
Multiple systems are needed to exploit the variety and volume of data sources, including a flexible data repository. Learn more about:
- Apache Hadoop 2 and YARN
- Data Lakes
- Intelligent data management layers needed to manage metadata and usage patterns as well as track consumption across these data platforms.
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...Lucas Jellema
Data has been and will be the key ingredient to enterprise IT. What is changing is the nature, scope and volume of data and the place of data in the IT architecture. BigData, unstructured data and non-relational data stored on Hadoop, in NoSQL databases and held in Elastic Search Indexes, Caches and Message Queues complements data in the enterprise RDBMS. Emerging patterns such as microservices that contain their own data, BASE, CQRS and Event Sourcing have changed the way we store, share and govern data. This session introduces patterns, technologies, trends and hypes around storing, processing and retrieving data using products such as MongoDB, MySQL, Kafka, Redis, Elastic Search and Hadoop/Spark -locally,in containers and on the cloud
Key take away: what an application architect and a developer should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together - for a consistent (enough) overall data presentation.
These are the slides for the presentation as well as all the demos I prepared for the Devoxx Morocco event in November 2017. The deck includes 150+ slides showing the setup of the demo environment (Oracle Public Cloud DBaaS, Event Hub, Application Container, Application Cache, Kubernetes and Kafka) and the detailed demo steps that show Microservices with Data Bounded Context, Event based choreography and CQRS in action.
Take Action: The New Reality of Data-Driven BusinessInside Analysis
The Briefing Room with Dr. Robin Bloor and WebAction
Live Webcast on July 23, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=360d371d3a49ad256942f55350aa0a8b
The waiting used to be the hardest part, but not anymore. Today’s cutting-edge enterprises can seize opportunities faster than ever, thanks to an array of technologies that enable real-time responsiveness across the spectrum of business processes. Early adopters are solving critical business challenges by enabling the rapid-fire design, development and production of very specific applications. Functionality can range from improved customer engagement to dynamic machine-to-machine interactions.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor, who will tout a new era in data-driven organizations, and why a data flow architecture will soon be critical for industry leaders. He’ll be briefed by Sami Akbay of WebAction, who will showcase his company’s real-time data management platform, which combines all the component parts needed to access, process and leverage data big and small. He’ll explain how this new approach can provide game-changing power to organizations of all types and sizes.
Visit InsideAnlaysis.com for more information.
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)Lucas Jellema
Gone are the days of a single enterprise database – typically and Oracle RDBMS – that holds all data in a strictly normalized form. We work with many more types of data (big and fast, structured and unstructured) that we use in various ways. Relational and ACID is not applicable to all of those. Always the latest, freshest data may not be uniformly valid either. We will continue to see an increase in specialized data stores that cater for specific needs and specific scenarios.
This presentation is a combination of a presentation and a demonstration on the various dimensions and use cases of using data and data stores in various ways – while ensuring the appropriate (!) levels of freshness, integrity, performance. Key take away: what as an architect you should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together - for a consistent (enough) overall data presentation. How are upcoming architectural patterns such as CQRS (command query responsibility segregation) , event sourcing and microservices influencing the way we handle data in the enterprise? Some of the technologies discussed: products such as MongoDB, MySQL, Neo4J, Apache Kafka, Redis, Elastic Search and Hadoop/Spark, Oracle Data Hub Cloud (based on Apache Cassandra) – used locally, in containers and on the cloud. Additionally we will discuss data replication scenarios.
A Winning Strategy for the Digital EconomyEric Kavanagh
The speed of innovation today creates tremendous opportunities for some, existential threats for others. Companies that win create their own success by leveraging modern data platforms. While architectures vary, the foundation is often in-memory, and the latency is real-time. Register for this Special Edition of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how today's data platforms enable the modern enterprise in groundbreaking ways. He'll be briefed by Chris Hallenbeck of SAP who will demonstrate how forward-looking companies are leveraging real-time data platforms to achieve operational excellence, make decisions faster, and find new ways to innovate.
A modern data platform meets the needs of each type of data in your businessMarcos Quezada
Durante algo más de 20 años nuestros clientes han construído con confianza las bases de datos de sus aplicaciones críticas para el negocio sobre bases de datos comerciales robustas como Oracle y DB2 sobre Power Systems. A medida que la transformación digital de sus empresas evoluciona, impulsada por la migración hacia plataformas móbiles y web, se ven enfrentados a la necesidad de extraer más valor de su bien más preciado: sus datos.
Muchas empresas ahora necesitan comenzar a explorar y explotar otros tipos de datos y otros volúmenes de datos, para ellos Cognitive Systems presenta soluciones para una plataforma moderna de datos basados en bases de datos de clave y valor, documentales, de grafos, de fuente abierta y paralelas como Hadoop.
Getting Started with Data Virtualization – What problems DV solvesDenodo
Experts and analysts agree that data virtualization's strategic role in enterprise architecture for increasing agility and flexibility in the delivery of information. In this presentation, you will find how data virtualization enables organizations to access, manage, and integrate data from a wide variety of data sources.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/IS9RGK.
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
Get your data to life with Power BI visualization and insights!
With the changing landscape of Power BI features it is essential to get hold of configuration and deployment practices within your data platform that will ensure you are on-par with compliance & security practices. In this session we will overview from the basics leading into advanced tricks on this landscape:
How to deploy Power BI?
How to implement configuration parameters and package BI features as a part of Office 365 roll out in your organisation?
What are newest features and enhancements on this Power BI landscape?
How to manage on-premise vs on-cloud connectivity?
How can you help and support the Power BI community as well?
Having said that within the objectives of this session, cloud computing is another aspect of this technology made is possible to get data within few clicks and ticks to the end-user. Let us review how to manage & connect on-premise data to cloud capabilities that can offer full advantage of data catalogue capabilities by keeping data secure as per Information Governance standards. Not just with nuts and bolts, performance is another aspect that every Admin is keeping up, let us look into few settings on how to maximize performance to optimize access to data as required. Gain understanding and insight into number of tools that are available for your Business Intelligence needs. There will be a showcase of events to demonstrate where to begin and how to proceed in BI world.
- D BI A Consulting
consulting@dbia.uk
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
The Briefing Room with David Loshin and Datawatch
Live Webcast Feb. 17, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=4a053043c45cf0c2f6453dfb8577c72a
Patience may be a virtue, but when it comes to streaming analytics, waiting is no option. Between Big Data and the Internet of Things, businesses are faced with more data and greater complexity than ever before. Traditional information architectures simply cannot support the kind of processing necessary to make use of this fast-moving resource. The modern context requires a shorter path to analytics, one that narrows the gap between governance and discovery
Register for this episode of The Briefing Room to hear veteran Analyst David Loshin as he explains how the prevalence of streaming data is changing business pace and processes. He’ll be briefed by Dan Potter of Datawatch, who will tout his company’s real-time data discovery platform for data in motion. He will show how self-service data preparation can lead to faster insights, ultimately fostering the ability to make precise decisions at the right time.
Visit InsideAnalysis.com for more information.
So you got a handle on what Big Data is and how you can use it to find business value in your data. Now you need an understanding of the Microsoft products that can be used to create a Big Data solution. Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together. How does Microsoft enhance and add value to Big Data? From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way
AI disruption is everywhere; all the businesses are impacted vertically & horizontally.
In this regard, we would like to propose AI Offerings for Students, which by your support, would help in Creating future force ready
For more details email to mahesh@orbitshifters.com
Leaders across the world are looking out for different strategies thru which they can leverage AI.
Realizing this we have successfully organized an event on "AI 4 Institution Leaders" at Nasik focused on the need for AI for educational institutions for the first time in India.
AI strategy is going to be a source of competitive advantage for the business in the marketplace. More so than before, organisations embarking on an AI journey require a map to chart a course through innumerable and significant obstacles that come their way.
DO NOT begin the journey without an AI blueprint map. It’s quite possible you could get lost without it.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
2. Technology Basics
Big Data Overview & Snapshot
Big Data Architecture : Deep Dive
Hadoop Overview
Clear Understanding of Data Science
Big Data Career Opportunities
Q & A
1
What we will cover in the 60 mins
2
3
4
5
6
7
3. Apart from that we will also cover …
• An overview of the shift to Data Science Platforms
• The 3 critical components of a Data Science platform
• Industries that are most likely to get disrupted and shift to Data Science
• Characteristics of firms that get left behind the Data Science wave
• Factors that push an industry towards Data Science
• A brief overview of aspects of platform architecture beyond technology
4. Who am I ?
• Mahesh Kumar CV is A Big Data Entrepreneur
• Mahesh got about 14 years of experience in architecting and
developing distributed and real-time data-driven systems.
• Specialties: Translating big data into action, Big Data Trainings,
Product Engineering Services, and Building Big Data CoE & Big Data
Incubators
• Written more than 60 Blogs in Big Data & SAP Analytics
• Worked in the past with IBM, Mindtree, CSC & Rolta companies
• Conducted couple of Boot camps & Workshops in Different
companies
5. Data Vs Information
• Data refers to a collection of numbers, characters and is a relative term;
• Data is Raw, Facts , Figures etc
• Information is Process Data
8. So where is this data getting generated ?
Social Networking
and Media:
700 million
Facebook users, 250
million Twitter users
175+ million
public blogs
Each Facebook
update, Tweet, blog
post and comment
creates multiple
new data points,
both structured,
semi-structured and
unstructured
Mobile Devices:
5 billion mobile
phones in use
worldwide
Each call, text and
instant message
is logged as data
particularly smart
phones and
tablets, also make
it easier to use
social media
Internet
Transactions:
Billions of online
purchases, stock
trades and other
transactions
happen every day,
including countless
automated
transactions
Each creates a
number of data
points collected
by retailers,
banks, credit
cards, credit
agencies and
others
Networked Devices
and Sensors:
Electronic
devices of all
sorts – including
servers and other
IT hardware,
smart energy
meters and
temperature
sensors -- all
create semi-
structured log
data that record
every action
9. Build Vs Buy
HUMAN DRIVEN
EMAIL
WEB LOGS
DOCUMENTS
SOCIAL
MACHINE DRIVEN
SATELLITE IMAGES
BIO-
INFORMATICS
M2M LOG FILES
SENSORS
VIDEO
AUDIO
BUSINESS DRIVEN
OLTP
ALL DATA TYPES
1X 10X 100X
BIG DATA
TODAY
BIG DATA TOMORROW
10. Defining Big Data
Any amount
of data
that's too
BIGto be handled by one computer
John Rauser
11. Why Big Data
12 TB of Tweets in a Day
80%
Of world’s data
is unstructured
30 billion pieces of content
shared on Facebook every
month
Expected Data in 2020 would be
35 ZB
5 Million Trade events per
second
2267 Billion Internet
Users
4.7 billion searches on
Google per day
5 Billion people
tweet,text,call and browse
on mobile phones daily
Walmart handles 1 Million transaction per
hour
255 Million
Websites
12. Big Data Reference Architecture
Structured Data Sources
Data Integration (Batch /
Near real-time)
Data Repositories
MDM
End User Analytics
Reports / Dashboards
Unstructured/Semi-
structured Data Sources
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Legacy applications, ERP
and CRM applications
Data Extraction
External feeds
Instrumentation data /
Sensors, RFID, Telematics,
Time and Location data
Real-time Streaming/Integration
Data Cleaning and
Transformation
Change Data Capture for
Structured Data
Change Data Capture
ODS
Analytics
Data Warehouse
DW Appliances
Data Marts
MOLAP CubeIn-memory Databases
Unstructured / Semi-
structured data
Scorecards and Metrics
Events and Alerts
Data Mining and Exploration
Predictive Analytics
Text Analytics
Visual Exploration
Mobile BI
Columnar Databases
13. Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Web logs, Application / Network
log, Social, Chat transcripts,
Emails
Legacy and ERP
Data
Extraction,
Transformation
External feeds
Instrumentation data / Sensors,
RFID, Telematics, Time and
Location data
Real-time Streaming /
Integration
Data
Quality
CDC for
Structured
data
Change
Data
Capture
ODS
DW
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Unstructured /
Semi-structured
Scorecards /
Metrics
Events /
Alerts
Data
Mining
Predictive
Analytics
Text
Analytics
HANA / BW
/ Sybase
SAP HANA
Dash
boards
BO WebI /
Crystal
Reports
BO dashboard
Data
Exploration
Mobile
BI
SAP HANA
Sybase IQ /
HANA
BO Mobile
SAP HANA/
Sybase
RDS /
Rapid
Marts
SAP BW
SAP Lumira
SAP Predictive
Analysis
Analytics
Hadoop
Platform
BO CMS
SAP HANA
/ SAP BW
SAP MDM
SAPBO
DataServices
3rd Party
3rd Party
SAP HANA
Big Data Reference Architecture
SAP
14. Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors, RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Events /
Alerts
Predictive
Analytics
Text
Analytics
Content
Analytics
InfoSphere
InformationServer
Dash
boards
CognosBuisnessIntelligence
Enterprise
Visual
Exploration
Mobile
BI
Cognos
TM1
Cognos
Mobile
PureData
(Netezza,
InfoSphere
Warehouse)
Cognos TM1
InfoSphere
Data Explorer
SPSS
Premium
SPSS
Content
Analytics
InfoSphere Streams
InfoSphere
CDC
Analytics
Sandbox
Big Insights /
Streams
Big Insights
InfoSphere
MDM
Big Insights /
NoSQL
Big Insights /
HBase
PureData(Netezza,
InfoSphereWarehouse,
ISAS)
Big Data Reference Architecture
IBM
15. Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors, RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Real Time
Decision Mgt.
Data
Mining
Predictive
Analytics
Text
Analytics
Data
Integrator
Exadata Dash
boards
BI Publisher
OBI Foundation
Suite
Visual
Exploration
Mobile
BI
Exalytics
OBI Mobile
Oracle/Exadata
Oracle /
Exadata
Essbase /
Hyperion
Exalytics
OBI Scorecard
Exalytics+
OracleREnt.
EndecaOracle Golden Gate
Analytics
Sandbox
Exalytics
Hadoop /
Golden Gate
Big Data
Appliance
Oracle MDM
Big Data
Appliance
Exadata EHCC
/ HBase
Silver Creek
Data Integrator
/ Golden Gate
Real-time
Decisions
Big Data Reference Architecture
ORACLE
16. Big Data Reference Architecture
Informatica+EMC+SAS
Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Data
Exploration
Predictive
Analytics
Text
Analytics
InformaticaPowerCenter&
DataQuality
EMC GreenPlum Dash
boards
SAS BI
Visual
Exploration
Mobile
BI
SAS Visual
Analytics
SAS BI
EMCGreenPlum
Database
EMC GreenPlum
SAS OLAP
Server
SAS Visual
BI
SAS Ent.
Miner
SAS Strategy
Mgt
JMP Pro
SAS Text
Miner
Informatica PowerCenter – Real-time edition
Analytics
Sandbox
EMC GreenPlum
UAP
Informatica
hParser /
Hadoop Pwx
EMC
Greenplum HD
EMC
GreenPlum
HD
HBase
Informatica
MDM
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
17. Big Data Reference Architecture
Open Source Technologies
Columnar
Databases
Structured Data Sources
Data
Integration
Data Repositories
MDM
End User Analytics
Reports
Unstructured/Semi-
structured Data Sources
Legacy Applications
and ERP
Data
Extraction
External feeds
Instrumentation data /
Sensors RFID, Telematics,
Time and Location data
Real-time Streaming
Data
Quality
CDC for
Structured
Data
CDC for
Unstructured
Data
Hadoop
Platform
ODS
Data
Warehouse
DW
Appliance
Data
Marts
MOLAP
Cube
In-memory
Databases
Semi /
Unstructured
Scorecards /
Metrics
Predictive
Analytics
Text
Analytics
ApacheMapReduce,Pig,
TalendDataIntegration&DataQuality
Commercial
Product
Dash
boards
Visual
Exploration
Mobile
BI
Apache Derby
PentahoMob
ile BI
MySQL,Apache
Hive
MySQL, Hive
SAS OLAP
Server
R, Apache
Mahout
SAS Text
Miner
Apache Flume
Analytics
Sandbox
Apache HDFS +
R
Apache
Hadoop
HBase,
NoSQL
HBase
Talend MDM
Web logs, Application /
Network log, Social, Chat
transcripts, Emails
Pentaho
BusinessAnalytics,BI
18. What is Hadoop
• It’s a framework for large-scale data processing:
• Inspired by Google’s architecture:
• A top-level Apache project – Hadoop is open source
• Written in Java, plus a few shell scripts
• An open-source software framework that supports data-intensive
distributed applications
• Abstract and facilitate the storage and processing of large and
rapidly growing data sets
• Structured and non-structured data
• Simple programming models
20. • Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes
(2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search
• AOL : Used for a variety of things ranging from statistics generation to running advanced algorithms for
doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors, dual
core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity.
• Facebook: To store copies of internal log and dimension data sources and use it as a source for
reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB raw
storage;
• FOX Interactive Media : 3 X 20 machine cluster (8 cores/machine, 2TB/machine storage) ; 10 machine
cluster (8 cores/machine, 1TB/machine storage); Used for log analysis, data mining and machine
learning
• NetSeer - Up to 1000 instances on Amazon EC2 ; Data storage in Amazon S3; Used for crawling,
processing, serving and log analysis
• Powerset / Microsoft - Natural Language Search; up to 400 instances on Amazon EC2 ; data storage
in Amazon S3
Hadoop uses every where
21. HDFS : High level architecture
• HDFS Follows a master-slave architecture
• 2 Major Daemons in HDFS –
• Name Node
• Data Node
• Master : Name Node
• Responsible for namespace and metadata
• Namespace : file hierarchy
• Metadata : ownership, permissions, block locations etc
• Slave : DataNode
• Responsible for storing actual data blocks
22. MapReduce : High Level Architecture
• Map reduce has a master slave architecture too
• 2 Daemon processes
• Master : Job Tracker
• Responsible for dividing, scheduling and monitoring work
• Slave : Task Tracker
• Responsible for actual processing
31. What's common to the following game changing solutions ?
1
2
3
4 5
Japanese dating app
Sensored cows in Netherland Googles autonomous car
MOOC
Heart implants
32. At the core there is a deep
embedded DATA PRODUCT !
33. Created by DATA SCIENCE !
Conquer the world ! Become Data Scientist
34. • How our health gets cared
for ?
• How we learn ?
• How we fall in love ?
• How we do farming ?
• How we drive ?
The world around is changing… Our lives are intimately Surrounded by Data products
(an intimate fabric of our lives)
35. • Amazon Defeated Borders ( Books )
• Netflix Defeated Blockbuster ( Video )
• iTunes Defeated Tower records ( Music )
• Google defeated Yahoo ( Search ) – Page rank algorithm
How did the following players disrupt the Marketplace ?
36. If Data Science is not integral you are no longer in the game
38. In a Nutshell
• Data Science is the extraction of knowledge from data
• Data Science is the art of turning data into actions
• The ability to take data—to be able to understand it, to process it,
to extract value from it, to visualize it, to communicate it
• Data Science seeks to
• Extract meaning from data
• Create " Data Products"
• Use all available data to tell a valuable story to non- practioners
The future belongs to the companies and people that
turn data into products
41. “As is” state in most organizations
Data
( Sales , Finance )
Reports
( BO, Cognos, MSAS )
42. “As is” stage with leading game changers
Data repository
Insights
Analytics cell + Modeling processes
( Segment, Score, Text mine )
Move from Reports Insightful Actions that Impact
43. What's are 4 core differences between Data Science & Dashboards ?
Data repository
Dashboards
Data repository
(Purchase habits)
Signal
(Similiar people discovery)
ML process
(Collaborative filtering)
Actions
(Recommend a product )
Outcomes
(Improve cross sell)
2
3
4
Dashboards
1
ML + Signals + Actions = Game Changing Outcomes
44. What exactly is an model ?
• Mathematically defining a real world phenomena
• Representative of real world
• For example cross sell model
45. What are 3 common things between
predictive models and caricatures ?
• Its an approximation, not
a perfection
• Its better than not having
anything
• It get the job done
REAL WORLD
ANALYTICAL MODEL
46. Use data
to discover Signals (patterns)
that cause changes
that impacts $ .
What's the Goal of Data Science ?
47. Data Science Reference Architecture – Key components
Hadoop
Hive
Hana
Info bright
Clustering
Text mining
Mobile
Digital
Data Ingestion Pipeline
48. Machine Learning Reference Architecture
STORE
( Hadoop, Hive, HANA, Cloudera, Splunk, Hortonworks)
SENSE
( signal extraction- text mining, scoring models ),
RESPOND
( Front line actions thru website, call centre )
1
2
3
49. Snapshot of Machine Learning Techniques
1. Segmentation
3.Forecasting
5. Scoring models
2.Text mining
4. Visual Analytics
6.Optimisation
1. Customer behavior segmentation
2. Defect segmentation
3. Employee segmentation model
4. Supplier segmentation mode
5. “Chunking” groups
6. Discovered by algorithm
1. Convert messy unstructured text into actionable signals
2. Keyword frequencies
3. Sentiment ratios
4. Blogs
5. Call center transcripts
6. Emails
7. Multi channel sentiment analysis
1. Predict CLTV
2. Predict Sales at a neighborhood outlet
3. Predict Salary based on experience, qualification,
rating, market demand
4. Identify drivers of behavior
5. Weights processing
1. Beyond line, bar , pie charts
2. Geospatial modeling to see geo correlation
3. Spread analysis
4. Outlier detection
1. Churn propensity
2. Cross sell
3. Attrition modeling in HR
4. Risk scoring models in Banking
5. Logistic
6. Neural networks
7. Decision trees
8. Support Vector machines
1. Constraint modeling
2. Maximize an outcome
3. Maximize sales without cannibalizing sister brands
53. Real world Unstructured text mining in
health care
Doctors transcripts
Split sentences
onto
words/tokens
Step-1 : SPLIT
Filter “noise”
words eg : I ,
the, is, was,
Step-2 : FILTER ‘Pulmonary’=
‘pulmonar’
‘Insomnia’ = ‘Sleep’ =
‘Sleeplessnes;
‘
Step-3 : STEMMING
Keyword extraction &
Theme generation
Step-4 : THEME EXTRACTION
Step-5 : THEME /
KEYWORD ANALYSIS
Lab diagnostics Nurses Observations
Cardiac
watch list
Oncology
watch list
Pulmonary
watch list
Diabetic
watch list
Schizophreni
a watch list
57. Industries disrupted by Data Science
• Infrastructure optimisation, Network securityTelecom
• Customer sentiment, Multi channel analysisBanking
• Consumer engagement, Recommendation enginesDigital channel
• Autonomous cards, Fords OnStarAutomotive
• WearablesHealth care
• Operations optimisationOil n Gas
• DigitisationRetail
58. What factors are driving companies towards data science ?
• Competitive advantage in the market place ( get ahead fast using unique insights )
• Existential threat ( others are moving ahead fast and I need to catch up )
• Revenue enhancement ( Cross sell models, recommenders )
• Cost optimisation ( Operational efficiency )
70. FAQ-1: “I am confused between Hadoop and Data Science …
What's difference between Hadoop and Data Science?”
• Hadoop = Data Infrastructure layer
• Data Science = Sensing patterns from data to impact business outcome
71. FAQ-2 : “I have worked on SAP, Oracle, etc How do I transition
to becoming a Data Scientist ?”
• Execute your first Data Science pilot
• Step-1 : Learn R
• Step-2 : Zero in on a business problem to solve
• Step-3 : Setup R Your technology connector …Get access to data from your
Technology
• Step-4 : Apply an Analytical construct ( VEDA ML )
• Step-5 : Discover the pattern which impacts the outcome
• Step-6 : Present final results to executive business team
• Explore setting up a Data science project within existing organisation
• Meetups to explore the outside world
72. FAQ-3: “Should I know probability and advanced statistics ?”
• Not really
• We are focussed on APPLICATION and not THEORY underpinning it
• We will teach you
• Business problem to solve
• How to execute the command on a platform
• What to look for in the output
• What happens within the black box can be seen later
73. FAQ-4: “This is a big shift for me … In your experience how long
does it take to make the transition from IT to Data Science ?”
• We have seen people make the transition from 4 weeks to about 6 months
• It depends upon the time + passion + drive you have
74. FAQ-5: “How are we going to prepare you for the data science
job market ?”
1. Mock preparatory sessions
2. Worksheets + Modelling Checklists + Data Science Playbooks
3. Live projects on clustering , scoring which can be put in resume
4. Our strategic tie-ups with Organisations looking for data science skills
5. Top 30 Practitioner generated Data Science questions
75. FAQ-6: “I am not an IT professional but a domain person. How
can I get started ?”
1. Option-1 : Focus on Industry use cases
2. Option-2 : Take basic introduction to data sciences
76. Big Data Resources• datasciencecentral.com
• bigdatauniversity.com
• Courseera.com
• Big Data Architecture
• Spotting Signals in Big Data
• Signal Extraction Methodology
• Advanced Visualization in Big Data
• Exploratory Data Analysis (EDA) : Quick Deep Dive
• Best practices in designing dashboards and scorecards
• Exploring Big Data Using Bivariate Analysis
• Where to start looking in Big Data using Univariate Analysis
• Big Data Platform & Applications
• Statistics Role in Data Science
• Applied Mathematics Role in Data Science
• Data-Scientist-playbook
• 5-disruption-data-products By Data Science
77. All The Best
Happy Hadooping & Dating with Data Science
Conquer the world !
Become Data Scientist