Big data refers to the large volumes of structured, semi-structured and unstructured data that are so large that traditional data processing applications are inadequate. This data comes from a wide variety of sources including sensors, social media, websites and more. Hadoop is an open-source software framework that allows distributed processing of large data sets across clusters of computers using simple programming models. It is commonly used by large companies for applications such as web search, data mining, and machine learning.
Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_tec...Tomas Pariente Lobo
Talk to Public Sector officials in Spain about Technological trends in Big Data. Aimed to create awareness of the BIG project (http://www.big-project.eu/) and get feedbak for the sector roadmap
Vodafone, Cyberpark ve Türkiye Teknoloji Geliştirme Vakfı işbirliğinde düzenlen etkinlikte büyük veri kavramı, Apache Hadoop Ekosistemi ve Türkiye ve Dünyadaki örnek uygulamalar anlatıldı.
-
1 Haziran 2016 - Onur Karadeli, Mustafa Murat Sever
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou, MBA, PhD
Dr. Gail Zhou presented this topic at DevNexus on Feb 25, 2014. Big Data history, opportunities, and applications. Big Data key concepts, reference architecture with open source technology stacks. Hadoop architecture explained (HDFS, Map Reduce, and YARN). Big Data start-up challenges and strategies to overcome them. Technology update: Hadoop and Cassandra based technology offerings.
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
Looking at NoSQL and Big Data Analytics as an evolution starting from Relational Databases, and go behind the hype. You can find more on this topic in my blog at: http://innovation-edge.blogspot.com/
Thanks to Gregory Piatetsky-Shapiro for the 2nd half of the slides.
General overview of the Big Data Concept.
Presentation of the Hierarchical Linear Subspace Indexing Method to perform exact similarity search in high dimensional data
Big data-public-private-forum--2013 publioc-sector_meeting_spain_big_data_tec...Tomas Pariente Lobo
Talk to Public Sector officials in Spain about Technological trends in Big Data. Aimed to create awareness of the BIG project (http://www.big-project.eu/) and get feedbak for the sector roadmap
Vodafone, Cyberpark ve Türkiye Teknoloji Geliştirme Vakfı işbirliğinde düzenlen etkinlikte büyük veri kavramı, Apache Hadoop Ekosistemi ve Türkiye ve Dünyadaki örnek uygulamalar anlatıldı.
-
1 Haziran 2016 - Onur Karadeli, Mustafa Murat Sever
Gail Zhou on "Big Data Technology, Strategy, and Applications"Gail Zhou, MBA, PhD
Dr. Gail Zhou presented this topic at DevNexus on Feb 25, 2014. Big Data history, opportunities, and applications. Big Data key concepts, reference architecture with open source technology stacks. Hadoop architecture explained (HDFS, Map Reduce, and YARN). Big Data start-up challenges and strategies to overcome them. Technology update: Hadoop and Cassandra based technology offerings.
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
Looking at NoSQL and Big Data Analytics as an evolution starting from Relational Databases, and go behind the hype. You can find more on this topic in my blog at: http://innovation-edge.blogspot.com/
Thanks to Gregory Piatetsky-Shapiro for the 2nd half of the slides.
General overview of the Big Data Concept.
Presentation of the Hierarchical Linear Subspace Indexing Method to perform exact similarity search in high dimensional data
Learn Big data and Hadoop online at Easylearning Guru. We are offer Instructor led online training and Life Time LMS (Learning Management System). Join Our Free Live Demo Classes of Big Data Hadoop .
Enough taking about Big data and Hadoop and let’s see how Hadoop works in action.
We will locate a real dataset, ingest it to our cluster, connect it to a database, apply some queries and data transformations on it , save our result and show it via BI tool.
Hadoop, Big Data, and the Future of the Enterprise Data Warehousetervela
Under the umbrella of big data, the nature of data warehousing inside enterprises is undergoing a massive transformation. Originally designed as a clearinghouse for organizing data to discover and analyze historical trends, business units are now putting extreme pressure on their data groups to enhance their services. Their goals: provide better customer service, real-time marketing, and more efficient business operations.
In this webcast, Big Data expert Barry Thompson will discuss how will enterprise data warehouses are evolving to meet these challenges. Some of the topics we will cover include:
- How Hadoop and other big data technologies are coexisting with traditional data warehouses
- Dealing with multiple big data sources – and multiple versions of the truth
- Techniques like warehouse replication and parallel data loading that enable platforms with different levels of service for different types of applications
I have collected information for the beginners to provide an overview of big data and hadoop which will help them to understand the basics and give them a Start-Up.
BigData HUB is a non-profit organization that help to spread Big Data and Data Science technology around Egyptian universities and Globally.
https://www.facebook.com/BigDataHub
Learn Big data and Hadoop online at Easylearning Guru. We are offer Instructor led online training and Life Time LMS (Learning Management System). Join Our Free Live Demo Classes of Big Data Hadoop .
This presentation shows a real life example of a large volume, continuous monitoring systems.
The presentation shows how continuous monitoring can be used in a number of different ways, such as fraud prevention, performance improvement, reputation protection.
It also provides 5 keys to a successful continuous monitoring project.
Trends in Big Data & Business Challenges Experian_US
Join our #DataTalk on Thursdays at 5 p.m. ET. This week, we tweeted with Sushil Pramanick – who is the founder and president of the The Big Data Institute (TBDI).
You can learn about upcoming chats and see the archive of past big data tweetchats here
http://www.experian.com/blogs/news/about/datadriven
Learn Big data and Hadoop online at Easylearning Guru. We are offer Instructor led online training and Life Time LMS (Learning Management System). Join Our Free Live Demo Classes of Big Data Hadoop .
Enough taking about Big data and Hadoop and let’s see how Hadoop works in action.
We will locate a real dataset, ingest it to our cluster, connect it to a database, apply some queries and data transformations on it , save our result and show it via BI tool.
Hadoop, Big Data, and the Future of the Enterprise Data Warehousetervela
Under the umbrella of big data, the nature of data warehousing inside enterprises is undergoing a massive transformation. Originally designed as a clearinghouse for organizing data to discover and analyze historical trends, business units are now putting extreme pressure on their data groups to enhance their services. Their goals: provide better customer service, real-time marketing, and more efficient business operations.
In this webcast, Big Data expert Barry Thompson will discuss how will enterprise data warehouses are evolving to meet these challenges. Some of the topics we will cover include:
- How Hadoop and other big data technologies are coexisting with traditional data warehouses
- Dealing with multiple big data sources – and multiple versions of the truth
- Techniques like warehouse replication and parallel data loading that enable platforms with different levels of service for different types of applications
I have collected information for the beginners to provide an overview of big data and hadoop which will help them to understand the basics and give them a Start-Up.
BigData HUB is a non-profit organization that help to spread Big Data and Data Science technology around Egyptian universities and Globally.
https://www.facebook.com/BigDataHub
Learn Big data and Hadoop online at Easylearning Guru. We are offer Instructor led online training and Life Time LMS (Learning Management System). Join Our Free Live Demo Classes of Big Data Hadoop .
This presentation shows a real life example of a large volume, continuous monitoring systems.
The presentation shows how continuous monitoring can be used in a number of different ways, such as fraud prevention, performance improvement, reputation protection.
It also provides 5 keys to a successful continuous monitoring project.
Trends in Big Data & Business Challenges Experian_US
Join our #DataTalk on Thursdays at 5 p.m. ET. This week, we tweeted with Sushil Pramanick – who is the founder and president of the The Big Data Institute (TBDI).
You can learn about upcoming chats and see the archive of past big data tweetchats here
http://www.experian.com/blogs/news/about/datadriven
"Big Data" is a term as ubiquitous as data itself, but it is more than just a way to describe the massive amount of information created every day. In fact, I would argue that it is more of a dynamic than a one-dimensional term.
In this presentation, I walk business audiences through the history and rise of big data, the four Vs of big Data, and end by looking at some practical applications and recommendations.
Originally presented on February 26, 2013 in Washington, DC at the US Chamber of Commerce.
Mohanbir Sawhney, Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management, Northwestern University presents at the 2012 Big Analytics Roadshow.
Companies are drinking from a fire hydrant of data that is too big, moving too fast and is too diverse to be analyzed by conventional database systems. Big Data is like a giant gold mine with large quantities of ore that is difficult to extract. To get value out of Big Data, enterprises need a new mindset and a new set of tools. They also need to know how to extract actionable insights from Big Data that can lead to competitive advantage. The Big Story of Big Data is not what Big Data is, but what it means for business value and competitive advantage.... read more: http://www.biganalytics2012.com/sessions.html#mohan_sawhney
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration
Summary
Separating Hadoop Myths from Reality by ROB ANDERSON at Big Data Spain 2013Big Data Spain
According to Gartner, Hadoop is near the top of the Hype Cycle. While some customers have questions about the enterprise capabilities of Hadoop, the answers are clear as production deployments continue to expand. This session will use successful customer experiences to highlight the power of Hadoop and separate the myths from reality.
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
Automate your Data Science pipeline with Ansible, Python and Kubernetes - ODSC Talk
What is Data Science and the Data Science Landscape
Process and Flow
Understanding Data
The Data Science Toolkit
The Big Data Challenge
Cloud Computing Solutions
The rise of DevOps in Data Science
Automate your data pipeline with Ansible
Big Data may well be the Next Big Thing in the IT world. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
Mohammad Quraishi, Senior IT Principal, Cigna
Like Moses seeing the Promised Land from afar, we knew the big data journey would be worth it, but we didn't know how hard it would be. In this talk, I'll delve into the details of our big data and analytics initiative at Cigna,
Slides used for the keynote at the even Big Data & Data Science http://eventos.citius.usc.es/bigdata/
Some slides are borrowed from random hadoop/big data presentations
Explores the notion of "Hadoop as a Data Refinery" within an organisation, be it one with an existing Business Intelligence system or none - looks at 'agile data' as a a benefit of using Hadoop as the store for historical, unstructured and very-large-scale datasets.
The final slides look at the challenge of an organisation becoming "data driven"
Hadoop as Data Refinery - Steve LoughranJAX London
Apache Hadoop is often described as a "Big Data Platform" but what does that mean? One way to better understand Hadoop is to talk about how Hadoop is used. This talk discusses using Hadoop as a "Data Refinery", which is a common use case. The concept is very much like a traditional oil refinery except with data, pulling in large quantities of "crude data" over pipelines, refining some into useful business intelligence; refining other pieces into slightly less crude data that stays in the cluster until needed later. This metaphor proves useful when considering how Hadoop could be adopted in an organisation that already has data warehousing and business intelligence systems -and when contemplating how to hook up a Hadoop cluster to the sources of data inside and outside that organisation. A key point to remember is that storing data in Hadoop is not a means to an end any more than storing data in a database is: it is extracting information from that data. Using Hadoop as a front end "data refinery" means that it can integrate with existing Business Intelligence systems, while providing the platform for new applications.
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
In dieser Session stellen wir anhand eines praktischen Szenarios vor, wie konkrete Aufgabenstellungen mit HDInsight in der Praxis gelöst werden können:
- Grundlagen von HDInsight für Windows Server und Windows Azure
- Mit Windows Azure HDInsight arbeiten
- MapReduce-Jobs mit Javascript und .NET Code implementieren
You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years.
With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopOCTO Technology
Use-cases and opportunities in BigData
Return on experience with Hadoop
* Introduction to BigData & Hadoop Technology
* Market Insights and Typical use-cases
* NetApp technology for Hadoop
* Best practices for your first project with Hadoop
David Thoumas, OpenDataSoft CTO, about data API strategy (rich API vs. multiple end-points) for broadcasting data & making business
At APIdays 2012, the 1st European event dedicated to API world
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
4. 4
Headlines
Data driven business
Data democratization
Data scientists
5. 5
The White House
+ $200M initiative
+ NSF: core techniques
+ NIH: 1000 genomes
+ DOE: advanced computing
+ DOD: data to decisions
+ USGS: Earth system
www.whitehouse.gov
11. What is big data 11
+ Big data:
+ “Data you can’t process by traditional tools”
+ “A phenomenon defined by the rapid acceleration in the
expanding volume of high velocity, complex and diverse
types of data.”
+ “Refers to a collection of tools, techniques and technologies
for working with data productively, at any scale.”
12. 12
What is Big data
+ 3V
+ Volume: petabytes (1000TB) to exabytes (1000PB)
+ Variety: structured, semi-structured, unstructured
+ Velocity: Tb/s data streams
+ Requires distributed processing
+ Big data = storage + processing
+ Big data = Hadoop (not only)
19. 19
GFS/HDFS
+ Distributed replicated data blocks (64Mb)
+ Master-slave architecture (Name Node, Data Nodes)
+ Not a general file system
+ Access via command line utils and API
+ Can’t modify after files written
20. 20
MapReduce
+ Scalable:
+ no file IO
+ no networking
+ no synchronization
+ Master-slave architecture
+ MapReduce programming model:
+ Master: divide, schedule, monitor work
+ functional programming
+ Slave: actual processing
+ like UNIX pipeline
21. 21
Data movement
+ store and process data on the same nodes
+ bring code to data, data “locality”
www.cloudera.com
24. Data Base NoSQL 24
Revolution
+ Needed:
+ fast read/write time
+ high concurrency
+ easy horizontally scalable
+ Flat data structure
+ Sacrificed:
+ DB Schema
+ SQL
+ Transactions
27. 27
Hadoop tools
+ Pig
+ high level scripting language (PigLatin)
+ converts to MapReduce jobs
+ Hive
+ SQL like queries on dat in HDFS
+ converts in MapReduce jobs
34. 34
Cloudera
+ Enterprise support for Apache Hadoop
+ Founded 2008, funding $141 M
+ Employee 230
+ Products:
+ CDH 4 (cloudera distrobution hadoop)
+ Impala
+ Consulting and training
www.cloudera.com
35. 35
MapR
+ Founded 2009, funding $20M
+ MapR Technologies is engineering game-
changing Map/Reduce related technologies
+ Products:
+ M3,M5,M7
+ NFS, no single node failure
+ NOT open source !
www.mapr.com
40. 40
Datameer
+ Founded 2009,
Funding $17,8M
+ Big data:
+ Data integration
+ Data Analytics
+ Data Visualization
www.datameer.com
41. 41
Datasift
+ Founded 2010, funding $29.7M
+ Data platform for social web
+ Aggregate and filter data
www.datasift.com
42. 42
Infochimps
+ Founded 2009, funding $5.5M
+ Transitioned from data marketpalce to big data platform
+ End-to-end big data solution, real time
www.infochimps.com
44. Big data Startups 44
2012
+ Platfora, in memory BI on Hadoop
+ Sumologic, log file analysis
+ Hadapt, Hadoop+RDBSM
+ Metamarkets, patterns in data flow
+ DataStax, consulting, training
+ Karmasphere, BI, analytics on Hadoop
45. Big data startups 45
2013!
+ 10gen, MongoDB
+ ClearStory, big data aggregation + analytics
+ Continuuity, Hadoop API
+ Parstream, database analytics
+ Zoomdata, data visualization
+ Climate corporation, predictive analytics
47. 47
Big data Processing
Batch
interactive stream
processing
minutes to Millisecond to
Query time continues
hours seconds
data volume TB to PT GB to PB continues
programming
MapReduce Queries DAG
model
Users Developers Analysts Developers
Hadoop
Open Source Drill, Impala Storm, Kafka
mapreduce
48. 48
New technologies
+ Real time quering
+ Drill (based on Google Dremmel)
+ Impala (Cloudera)
+ Data stream processing
+ Storm (Twitter), real time analytics
+ Kafka (LinkedIn), messaging system
49. 49
Machine learning
+ Predictive analytics
+ Patterns discovery
+ Data mining
+ Tools:
+ Mahout
+ R
51. 51
Observations
+ Game changing technologies come from big companies
+ Open Source (!)
+ Start-up ecosystem
+ Less general, more specialized
+ Next step: big data analytics and visualization
52. 52
Data scientist
+ Machine Learning
+ Data Mining
+ Statistics
+ Software Engineering
+ Hadoop/MapReduce/HBase/Hive/Pig
+ Java, Python, C/C+, SQL
“By 2018, the United States alone could face a shortage of 140,000 to 190,000
people with deep analytical skills as well as 1.5 million managers and analysts with
the know-how to use the analysis of big data to make effective decisions.”
54. 54
Contacts
+ Leonid Zhukov, Ph.D.
+ School of Applied Mathematics and Information Science
Higher School of Economics, NRU-HSE
+ lzhukov@hse.ru
+ www.leonidzhukov.ru