2. Knowledge discovery in agriculture database From

2. Knowledge discovery in agriculture

From many
resources of human activity there is growing amount of data available that can
be used for betterment of the world. The Human Genome Project, codifies life
has been read, but it is not yet known how life works. Complex analysis had be
performed on such a myriad of data. Another example is Web pages on the
Internet, where useful relationships can be found between Web pages to improve
the search results. Similarly in agriculture various sensors are used to record
continuous data of many parameters such as humidity, temperature, images, sound
etc. Many challenges are existing in terms of analyzing and finding useful
information from such an ocean of data and Data mining (DM) is designed to
address problems such as the ones mentioned above.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


of Gujarat has thought out a new way by launching Soil Health Card Program
(SHCP) which is an online program of technology transfer with an individual
farm condition in focus. This program is expected to bridge the gap between
Scientist – Farmers and Input-Output Dealers effectively. It helps making
transfer of technology more scientific, precise, easy, and need based. The Soil
Health Card System is a web based information system designed to run on a
networked environment including intranet, internet and on GSWAN (Gujarat State
Wide Area Network). This is a repository of agricultural information for the
benefit of farmers, agricultural scientists and policy makers. This program
generates and provides the fertilizer recommendations on the basis of soil
analysis and the nutrient requirements of the crop for each field. This will
increase the efficiency of the fertilizer and saves consumption of the

learning is programming computers to optimize a performance using example data
or past experience. We need learning in cases where we cannot directly write a
computer program to solve a given problem, but need example data or experience.
One case where learning is necessary is when human expertise does not exist, or
when humans are unable to explain their expertise. To be intelligent, a system
that is in a changing environment should have the ability to learn. If the
system can learn and adapt to such changes, the system designer need not
foresee and provide solutions for all possible situations.

are many presentations of Data mining approach. Machine learning is one of them
and widely used. Machine learning is a domain that is focused on developing
algorithms that allow computers to learn to resolve problems based on past
records 15. Data mining is a science to discover knowledge from databases.
The database contains a collection of instances (records or case). Each
instance used by machine learning and data mining algorithms is formatted using
the same set of fields (features, attributes, inputs, or variables). When the
instances contain the correct output (class label) then the learning process is
called the supervised learning 16. The other Machine learning approach is
clustering which works without knowing the class label of instances is called
unsupervised learning 17. The focus of this research is on classification and
clustering for Agricultural Soil health card database.


3. Motivation of the problem


In ML Classification
is the task of allocating each record to one of the numerous predefined
classes. Training set is the one whose class labels are known and along with
the learning algorithm they are used to build a classification model which is
applied to the test set whose class labels are unknown 18. An individual record
in training data are represented as rows. Classifier’s input is as set of
records, where an attributes set forms each record and class label is one of
the special attribute which is always discrete. The
target function maps each attribute set to one of the predefined class labels is
actually a learning process of classification. The target function is
also called as classification model. The classification algorithm employs a
learning algorithm to identify a model that most appropriate fits the relationship
between attribute set and class label of the input data. However there are some
classification algorithms which do not make a model, but make the
classification decision by comparing the test set with the training set each
time they perform classification. These algorithms are known as instance-based
learning algorithms.


The k nearest neighbor (KNN) algorithm is a nonparametric
classifier. With KNN a sample is

according to the same class that outweighs in its K closest instances, hence
KNN has a very simple implementation, in addition to that, the key advantages
are: has few parameters namely the distance and the value of  K, it supports noise between classes, this
means the classes not required to be linearly separated, which can be described
for not depending on the distribution model of the data, but only from its
nearest neighbors. Still, KNN as any other classifier has disadvantages. Although
withstand noise, the choice of the attributes must be careful. If there are
inappropriate or too noisy attributes, they may deviate estimates. The biggest
problem of KNN and the major focus of this work has to do with the trend that KNN
is a lazy-learning algorithm since for classifying each instance it needs to
calculate the distances to all other known instances N (training samples). The
KNN algorithm has a time complexity of O(N:D) for each classification, where D
the number of attributes that characterizes a sample. Now, this scenario is
problematic for large data sets and/or with a large number of attributes. In
many domains (e.g. agriculture, multispectral images, text categorization, biometrics or retrieval of multimedia databases)
the size of the data set is very large that real-time systems cannot meet the
expectations of the time and storage requirements to process them. In influence
of such conditions, classifying become a very problematic task for algorithm
such as KNN. In addition, since the NN rule stores every instance in the
Training Set (TS), noisy instances are stored as well, which can considerably
degrade the classi?cation accuracy.


In Machine
Learning(ML) literature, many proposals has been discussed on removing some of
the training instances size are generally referred to as prototype
selection5. We can differentiate between two prototype selection approaches,
Firstly, to eliminate erroneously labelled instances by editing approaches.
Secondly, the condensing algorithm approach aim at selection a small subset of
instances without a significant degradation of the resultant classification