구글링이 검색하다의 의미이듯 스플렁킹하면 IT데이터를 검색하다라는 의미로 IT가이들에서는 받아들여지고 있음.
스플렁크는 모니터링 솔루션이 아니라 차세대 데이터 처리 엔진임.
Many of you are familiar with SQL and writing SQL queries to relational databases. How is Splunk different from using a relational database to query for insights? Basically, you need to create a good schema for the data at design time (and hopefully not have to change the schema later) This means you also need to know what types of queries you’re like to run against the data to design a good schema As for the data, you need to make sure you figure out what the tables should look like and how to convert the data to fit into the tables
With Splunk, there’s no need to specify a schema. We create structure at search time so the queries and searches can be totally ad-hoc. The data can come from a variety of text based sources and can continuously evolve without playing with data formats and such
API's are the basic components that allow developers to use the languages and tools they already know, lowering the barrier to entry for developers.
Allows for developers to build some, or an entire applications in an environment a developer is already comfortable with. From API's to tools to entire development environments, Splunk will integrate into the existing framework.
We’re making it easy for analysts, not just developers, to use Splunk... By providing a way to integrate with a variety of analytical tools, Splunk will become a key component of an analysts toolkit.
Anyone doing data analysis will be able to start with tools they're familiar with, and get real value from Splunk. These analysts are, by and large, the ones who will be using Splunk to drive a new level of business insight, bringing out the value of Operational Intelligence.
Splunk is announcing the planned availability of a new software package called Splunk Enterprise™ with Hadoop. This new offering will include Splunk Enterprise™, the Splunk Hadoop integration layer and Apache™ Hadoop™. The Splunk Hadoop integration layer will provide more than just point-to-point connectivity and is planned to support the following operations: • Issuing MapReduce queries, or higher level queries (using Pig, Hive for example) from the Splunk search language, pull the resulting data sets back into Splunk • Indexing the output of Hadoop jobs in Splunk • Indexing data storied in HDFS in Splunk • Delivering data from Splunk to HDFS • Calling Splunk APIs directly from Hadoop jobs
According to IDC, unstructured data, much of it generated by machines, accounts for more than 90% of the data in today’s organizations. All websites, communications, networking and complex IT infrastructures generate massive streams of machine data every second of every day, in an array of unpredictable formats that are difficult to process and analyze by traditional methods or in a timely manner.
Machine data is one of the fastest growing and most pervasive segments of “big data”—generated by websites, applications, servers, networks, mobile devices and all the sensors and RFID assets that produce data every second of every day. It’s also one of the most valuable, containing a definitive record of user transactions, customer behavior, sensor activity, machine behavior, security threats, fraudulent activity and more. Traditional technologies predominantly built on relational databases cannot handle the complexity or massive scale of today’s machine data. Nor do they allow the flexibility to ask any question or get questions answered in real time—which is now an expectation of users. By monitoring and analyzing everything from customer clickstreams and transactions to network activity and call records —and more—Splunk software turns machine data into valuable insights no matter what business you’re in. It’s what we call operational intelligence.
Individual components in your infrastructure generate hundreds of events per second. A datacenter can log many terabytes of data per day. Making use of this data, however, presents real challenges. Existing data analysis, management and monitoring solutions are simply not engineered for this high-volume, high-velocity and highly diverse data. Consider traditional information management systems, such as business intelligence and data warehouse tools. These systems are batch-oriented and designed for structured data with rigid schemas. IT management and security information and event management tools on the other hand, provide a very narrow view of the underlying data and are hard-wired for specific data types and sources. They also don’t provide historical context. Finding a better way to sift, distill and understand the vast amounts of machine data can transform how IT organizations manage, secure and audit IT. It can also provide valuable insights for the business on how to innovate and offer new services, as well as trends and customer behaviors.
Machine data complexity – getting to the data – is a real challenge. Let’s take an example of a customer call a service desk. We have a customer in Boston who used to have 36 people on the phone for up to 8 hours while they tried to figure out why the core website was down And it’s not just a problem for IT, it can harm the business. Customer calls service desk – service desk logs calls and escalates (red light/green light, everything looks green) Escalated to App support – looks at java monitoring tools and everything looks fine because rely on instrumentation; but no access to logs! Developer gets pulled in and has to stop working on new code Needs to ask sysadmin for logs Developer establishes not his problem, escalate to DB guy DB guy looks at audit logs and points to bad query We call this “human latency” and customers we talk to say it can consume hours or sometimes days of previous time when issues occur!
Unlike traditional structured data or multi-dimensional data– for example data stored in a traditional relational database for batch reporting – machine data is non-standard, highly diverse, dynamic and high volume. You will notice that machine data events are also typically time-stamped – it is time-series data.
Take the example of purchasing a product on your tablet or smartphone: the purchase transaction fails, you call the call center and then tweet about your experience. All these events are captured - as they occur - in the machine data generated by the different systems supporting these different interactions.
Each of the underlying systems can generate millions of machine data events daily. Here we see small excerpts from just some of them.
When we look more closely at the data we see that it contains valuable information – customer id, order id, time waiting on hold, twitter id … what was tweeted.
What’s important is first of all the ability to actually see across all these disparate data sources, but then to correlate related events across disparate sources, to deliver meaningful insight.
If you can correlate and visualize related events across these disparate sources, you can build a picture of activity, behavior and experience. And what if you can do all of this in real-time? You can respond more quickly to events that matter.
You can extrapolate this example to a wide range of use cases – security and fraud, transaction monitoring and analysis, web analytics, IT operations and so on.
Using Splunk, organizations identify and resolve issues up to 70% faster and reduce costly escalations by up to 90%. Splunk is one place to find and fix problems, and investigate incidents across all your IT systems and infrastructure - your applications, websites, servers, networks, virtual machines, security devices, and more. This alone eliminates much of the "human latency" experienced in the trenches.
It’s fair to ask “what’s so different about this new generation of data?” After all, haven’t data volumes always been growing?
The answer is yes, data is always growing. Some types of data are more mature. For example, business application data that comes from accounting systems, databases, and the like. This data is well understood, highly structured, and is usually managed by relational databases and OLAP systems. This data is growing more slowly – and the technologies to manage it are quite capable.
There is also human-generated data, such as documents, text messages, and video. Technologies like Google are doing a great job of harvesting, indexing, and managing human-generated data. Document management systems handle some of this information, and those technologies are well known and mature.
What’s new about machine data are the massive volumes of data that are being generated by devices, like servers, web streams, and mobile technologies. This data has highly diverse formats, and time is a critical dimension. It also contains human-generated data. This is the data that Splunk manages – this is the world of machine data.
Splunk is as important to the world of machine data as the relational data base is to structured data, or as Google is to text data.
Splunk’s flagship product is Splunk Enterprise. Splunk Enterprise is a fully featured, powerful platform for collecting, searching, monitoring and analyzing machine data. Splunk collects machine data securely and reliably from wherever it’s generated. It stores and indexes the data in real time in a centralized location and protects it with role-based access controls. You can even leverage other data stores. Splunk lets you search, monitor, report and analyze your real-time and historical data. Now you have the ability to quickly visualize and share your data, no matter how unstructured, large or diverse it may be. Troubleshoot problems and investigate security incidents in minutes (not hours or days). Monitor your end-to-end infrastructure to avoid service degradation or outages. Gain real-time visibility and critical insights into customer experience, transactions and behavior. Use Splunk and make your data accessible, usable and valuable across the enterprise.
Splunk collects and indexes any machine data from virtually any source, format or location in real time. This includes data streaming from packaged and custom applications, app servers, web servers, databases, networks, virtual machines, telecoms equipment, OS’s, sensors, and much more. There’s no requirement to “understand” the data upfront. Just point Splunk at your data or deploy Splunk forwarders to reliably stream data from remote systems at scale. Splunk immediately starts collecting and indexing, so you can start searching and analyzing. No more armies of consultants, or a DBA to make it work.
Here's how using Splunk and your machine data can drive significant benefits for your organization. Search and investigation. Using Splunk, organizations identify and resolve issues up to 70% faster and reduce costly escalations by up to 90%. Splunk is one place to find and fix problems, and investigate incidents across all your IT systems and infrastructure. Proactive monitoring. Monitor IT systems in real time to identify issues, problems and attacks before they impact your customers, services and revenue. Splunk keeps watch of specific patterns, trends and thresholds in your machine data so you don't have to. Trigger notifications in real-time via email or RSS, execute a script to take remedial actions, send an SNMP trap to your system management console or generate a service desk ticket. Operational visibility. See the whole picture, track performance and make better decisions. Visualize usage trends to better plan for capacity; spot SLA infractions, track how you are being measured by the business. Do all of this using your existing machine data without spending millions of dollars instrumenting your IT infrastructure. Real-time business insight. Make better-informed business decisions by understanding trends, patterns and gaining Operational Intelligence from your machine data. See the success of new online services by channel or demographic, reconcile 3rd-party service provider fees against actual use, find your heaviest users and heaviest abusers, and more. Because machine data captures every behavior, the possibilities are game changing. You'll find the lead times to get to this intelligence dramatically less than other solutions - measured in minutes/hours instead of months.
Both IT and business professionals can analyze machine data to get real-time visibility and operational intelligence. With our data engine and our customers' machine data, organizations can meaningfully improve their performance in a wide range of areas e.g. meet service levels, reduce costs, mitigate security risks, maintain compliance and gain insights.
We have been seeing innovation with Splunk outside of IT in a range of exciting new areas. Personal Activity Monitoring Devices like Fitbit tell me how active a person is in a given day. It has an open API that allows me to track my offline movements and analyze them online. I can correlate my daily activity with all sorts of other measurements, calorie intake, blood pressure and maybe even number of unread emails in my inbox on a given day and start to correlate health related activities to work productivity. 'Building Power Consumption’ Splunk indexes data from 'power-taps' in buildings and correlates it with power tap-location information to provide real-time insight and analysis of power consumption per floor/area/room. They also have the ability to drill-down to identify the reason for any excessive power consumption and trigger automatic remote shut-off to save energy (weekends, based on power levels, etc.). Several organizations are Splunking power consumption to look for cost savings and environmental benefits. 'Flood Monitoring Warning’ Developed by a partner in Thailand in conjunction with the Thai govt. Splunk collects, indexes and monitors water level sensor data in real-time and alerts subscribers in advance of any future impending flood situations.
You can share and reuse Apps within your organization and the rest of the Splunk community. There are a growing number of Apps available on our community site www.splunkbase.com, built by our community, partners and Splunk. You can find Apps that help visualize data geographically, or that support specific use cases, such as enterprise security or PCI compliance. There are also Apps for different operating systems and third-party technologies, such as Windows, Linux, Blue Coat, Cisco, WebSphere and F5 Networks. Apps are being created all the time, so bookmark the site and check in frequently. Examples on this page include Apps for Cisco, F5, for BlueCoat, an award winning “Google Maps” App, Apps to gauge Twitter sentiment, external ‘WHOIS’ lookups, license usage, and more.
As of 12/29/10: Overall Progress, 5/15 References Targets 1TB (6)Citrix Tmobile LinkedIN Salesforce Comcast MetroPCS
Since June 2006, more than 1,600 users have purchased the enterprise license (Feb 2010) These enterprise customers now use Splunk across a balanced and wide range of industries from telecommunications, financial services, government and large consumer facing internet services. Last year, 2009, over 650 new customers started using Splunk. Like the customer examples we just saw, these customers have transformed the management, security and compliance of their IT infrastructures with IT Search.
4. 판매 및 거래 기록
Human 생성 데이터 기계의
1. 고객 정보
2. 물류 및 제조 정보
3. 금융 신용 정보
시간적 요인이 적용된 데이
터 구조나 포멧을 예측할
수 없는 비 정량화 된 데이
모든 IT 시스템으로 여러
벤더로 부터 생성된 모든
방대한 종류의 데이터
방대한 양의 데이터
빠른 호출 , 분석 및 상관관
계 분석 요구