Conclusion
- There are a lot of jobs dealing with large data sets.
- Take Home Points
- There is a rapidly increasing amount of data to be processed.
- Standard databases can help with relatively large data (< 2 GB).
- You can't process really big data well with just one machine.
- For really big datamining tasks, you need distributed storage and
distributed processing.
- Reading: For this week is
practical advice for analysis of large complex data sets .
You might also want to look at MAD Skills: New Analysis Practices for
Big Data by Cohen, Dolan, Dunlap, Hellerstein and Welton.
- Reading: for next week is is Brachman and Levesque Chpt 6.